Skip to main content
GitHub

Cohere

Auto-instrumentation for Cohere API.

Risicare automatically instruments the Cohere Python SDK.

Installation

pip install risicare cohere

Basic Usage

import risicare
import cohere
 
risicare.init()
 
co = cohere.Client()
 
# Automatically traced
response = co.chat(
    model="command-r-plus",
    message="What is the capital of France?"
)

Supported Methods

MethodTraced
chatYes (sync + async)
generateYes (sync + async)
embedYes (sync + async)

Streaming

Streaming Not Instrumented

Cohere streaming via chat_stream is not instrumented. Use the non-streaming chat method for traced calls.

for event in co.chat_stream(
    model="command-r-plus",
    message="Write a poem"
):
    if event.event_type == "text-generation":
        print(event.text, end="")

RAG with Connectors

Connector usage is traced:

response = co.chat(
    model="command-r-plus",
    message="What's in my documents?",
    connectors=[{"id": "web-search"}]
)

Embeddings

Embedding calls are captured:

response = co.embed(
    texts=["Hello world", "Goodbye world"],
    model="embed-english-v3.0",
    input_type="search_document"
)

Captured Attributes

AttributeDescription
gen_ai.systemcohere
gen_ai.request.modelModel name
gen_ai.operationchat, generate, or embed
gen_ai.response.idGeneration ID
gen_ai.usage.prompt_tokensInput tokens
gen_ai.usage.completion_tokensOutput tokens
gen_ai.latency_msRequest latency in milliseconds

For embed calls, Risicare also captures:

AttributeDescription
gen_ai.input.countNumber of input texts

Operation-Specific Attributes

Token usage (prompt_tokens, completion_tokens) is only captured for chat operations. The generate operation captures gen_ai.request.model and gen_ai.latency_ms. The embed operation captures gen_ai.input.count and gen_ai.latency_ms.

Cost Tracking

ModelInput (per 1M)Output (per 1M)
command-r-plus$2.50$10.00
command-r$0.15$0.60
command$1.00$2.00
embed-english-v3.0$0.10-

Next Steps