Skip to main content
GitHub

LiteLLM

Auto-instrument LiteLLM for unified LLM access.

Risicare automatically instruments LiteLLM for unified access to 100+ LLM providers.

Installation

pip install risicare[litellm]
# or
pip install risicare litellm

Version Compatibility

Requires litellm >= 1.30.0.

Auto-Instrumentation

import risicare
import litellm
 
risicare.init()
 
# Automatically traced
response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

What's Captured

FeatureDescription
Completion CallsAll completion/acompletion calls
Provider RoutingModel-to-provider mapping
FallbacksFallback chain execution
Cost TrackingLiteLLM's cost calculation
Load BalancingRouter selections

Span Hierarchy

litellm.completion/{model} (LLM_CALL kind)

Provider Deduplication

Provider Deduplication

When using LiteLLM, underlying LLM provider spans are automatically suppressed to avoid duplicate traces. You don't need to disable provider instrumentation manually.

Multiple Providers

LiteLLM's unified interface works with all providers:

# OpenAI
response = litellm.completion(model="gpt-4o", messages=[...])
 
# Anthropic
response = litellm.completion(model="claude-3-sonnet-20240229", messages=[...])
 
# Bedrock
response = litellm.completion(model="bedrock/claude-3-sonnet", messages=[...])
 
# Together AI
response = litellm.completion(model="together_ai/meta-llama/Llama-3-70b", messages=[...])

Streaming

response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)
 
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Fallbacks

Fallback chains are fully traced:

response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    fallbacks=["claude-3-sonnet-20240229", "gemini-pro"]
)
 
# Each fallback attempt appears as a child span

Router

The LiteLLM Router is instrumented:

from litellm import Router
 
router = Router(
    model_list=[
        {"model_name": "gpt-4", "litellm_params": {"model": "gpt-4o"}},
        {"model_name": "gpt-4", "litellm_params": {"model": "azure/gpt-4"}},
    ]
)
 
# Load balancing decisions are captured
response = router.completion(model="gpt-4", messages=[...])

Embeddings

response = litellm.embedding(
    model="text-embedding-ada-002",
    input=["Hello, world!"]
)

Cost Tracking

LiteLLM's cost calculation is captured:

from litellm import completion_cost
 
response = litellm.completion(model="gpt-4o", messages=[...])
cost = completion_cost(completion_response=response)
 
# Cost appears in span attributes

Next Steps