LiteLLM

Auto-instrument LiteLLM for unified LLM access.

Risicare automatically instruments LiteLLM for unified access to 100+ LLM providers.

Python only

This framework integration is available in the Python SDK only. No JavaScript package exists for LiteLLM.

Installation

pip install risicare[litellm]
# or
pip install risicare litellm

Version Compatibility

Requires litellm >= 1.30.0.

Auto-Instrumentation

import risicare
import litellm
 
risicare.init()
 
# Automatically traced
response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

What's Captured

Feature	Description
Completion Calls	All completion/acompletion calls
Provider Routing	Model-to-provider mapping
Fallbacks	Fallback chain execution
Cost Tracking	LiteLLM's cost calculation
Load Balancing	Router selections

Span Hierarchy

litellm.completion/{model} (LLM_CALL kind)

Provider Deduplication

When using LiteLLM, underlying LLM provider spans are automatically suppressed to avoid duplicate traces. You don't need to disable provider instrumentation manually.

Multiple Providers

LiteLLM's unified interface works with all providers:

# OpenAI
response = litellm.completion(model="gpt-4o", messages=[...])
 
# Anthropic
response = litellm.completion(model="claude-3-sonnet-20240229", messages=[...])
 
# Bedrock
response = litellm.completion(model="bedrock/claude-3-sonnet", messages=[...])
 
# Together AI
response = litellm.completion(model="together_ai/meta-llama/Llama-3-70b", messages=[...])

Streaming

response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)
 
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="")

Fallbacks

Fallback chains are fully traced:

response = litellm.completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    fallbacks=["claude-3-sonnet-20240229", "gemini-pro"]
)
 
# Each fallback attempt appears as a child span

Router

The LiteLLM Router is instrumented:

from litellm import Router
 
router = Router(
    model_list=[
        {"model_name": "gpt-4", "litellm_params": {"model": "gpt-4o"}},
        {"model_name": "gpt-4", "litellm_params": {"model": "azure/gpt-4"}},
    ]
)
 
# Load balancing decisions are captured
response = router.completion(model="gpt-4", messages=[...])

Embeddings

response = litellm.embedding(
    model="text-embedding-ada-002",
    input=["Hello, world!"]
)

Cost Tracking

LiteLLM's cost calculation is captured:

from litellm import completion_cost
 
response = litellm.completion(model="gpt-4o", messages=[...])
cost = completion_cost(completion_response=response)
 
# Cost appears in span attributes

Next Steps

DSPy

Declarative prompting

Learn more

All Frameworks

View all supported frameworks

Learn more

Edit this page on GitHub

PreviousInstructor NextDSPy