Skip to main content
GitHub

Ollama

Auto-instrument Ollama for local inference.

Risicare automatically instruments the Ollama SDK for local LLM inference.

Installation

pip install risicare ollama

No API Key Required

Ollama runs locally, so no API key is needed. Just ensure Ollama is running on your machine.

Auto-Instrumentation

import risicare
import ollama
 
risicare.init()
 
# Automatically traced
response = ollama.chat(
    model="llama3",
    messages=[{"role": "user", "content": "Hello!"}]
)

Captured Attributes

AttributeDescription
gen_ai.systemollama
gen_ai.request.modelModel name
gen_ai.request.streamWhether streaming was requested
gen_ai.usage.prompt_tokensInput tokens (from prompt_eval_count)
gen_ai.usage.completion_tokensOutput tokens (from eval_count)
gen_ai.usage.total_tokensTotal tokens
gen_ai.latency_msRequest latency in milliseconds

Token Count Source

Token counts come from Ollama's prompt_eval_count and eval_count fields in the response. For the generate API, gen_ai.prompt.content is captured instead of message-level inputs.

Streaming

stream = ollama.chat(
    model="llama3",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)
 
for chunk in stream:
    print(chunk["message"]["content"], end="")

Generate API

The generate function is also instrumented:

response = ollama.generate(
    model="llama3",
    prompt="Hello, how are you?"
)

Async Support

import asyncio
from ollama import AsyncClient
 
async def main():
    client = AsyncClient()
    response = await client.chat(
        model="llama3",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response["message"]["content"])
 
asyncio.run(main())
ModelDescription
llama3Llama 3 (default size)
llama3:70bLlama 3 70B
mistralMistral 7B
mixtralMixtral 8x7B
codellamaCode Llama
phi3Phi-3

Next Steps