LiteLLM
Auto-instrument LiteLLM for unified LLM access.
Risicare automatically instruments LiteLLM for unified access to 100+ LLM providers.
Installation
pip install risicare[litellm]
# or
pip install risicare litellmVersion Compatibility
Requires litellm >= 1.30.0.
Auto-Instrumentation
import risicare
import litellm
risicare.init()
# Automatically traced
response = litellm.completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)What's Captured
| Feature | Description |
|---|---|
| Completion Calls | All completion/acompletion calls |
| Provider Routing | Model-to-provider mapping |
| Fallbacks | Fallback chain execution |
| Cost Tracking | LiteLLM's cost calculation |
| Load Balancing | Router selections |
Span Hierarchy
litellm.completion/{model} (LLM_CALL kind)
Provider Deduplication
Provider Deduplication
When using LiteLLM, underlying LLM provider spans are automatically suppressed to avoid duplicate traces. You don't need to disable provider instrumentation manually.
Multiple Providers
LiteLLM's unified interface works with all providers:
# OpenAI
response = litellm.completion(model="gpt-4o", messages=[...])
# Anthropic
response = litellm.completion(model="claude-3-sonnet-20240229", messages=[...])
# Bedrock
response = litellm.completion(model="bedrock/claude-3-sonnet", messages=[...])
# Together AI
response = litellm.completion(model="together_ai/meta-llama/Llama-3-70b", messages=[...])Streaming
response = litellm.completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a story"}],
stream=True
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")Fallbacks
Fallback chains are fully traced:
response = litellm.completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
fallbacks=["claude-3-sonnet-20240229", "gemini-pro"]
)
# Each fallback attempt appears as a child spanRouter
The LiteLLM Router is instrumented:
from litellm import Router
router = Router(
model_list=[
{"model_name": "gpt-4", "litellm_params": {"model": "gpt-4o"}},
{"model_name": "gpt-4", "litellm_params": {"model": "azure/gpt-4"}},
]
)
# Load balancing decisions are captured
response = router.completion(model="gpt-4", messages=[...])Embeddings
response = litellm.embedding(
model="text-embedding-ada-002",
input=["Hello, world!"]
)Cost Tracking
LiteLLM's cost calculation is captured:
from litellm import completion_cost
response = litellm.completion(model="gpt-4o", messages=[...])
cost = completion_cost(completion_response=response)
# Cost appears in span attributes