Cohere
Auto-instrumentation for Cohere API.
Risicare automatically instruments the Cohere Python SDK.
Installation
pip install risicare cohereBasic Usage
import risicare
import cohere
risicare.init()
co = cohere.Client()
# Automatically traced
response = co.chat(
model="command-r-plus",
message="What is the capital of France?"
)Supported Methods
| Method | Traced |
|---|---|
chat | Yes (sync + async) |
generate | Yes (sync + async) |
embed | Yes (sync + async) |
Streaming
Streaming Not Instrumented
Cohere streaming via chat_stream is not instrumented. Use the non-streaming chat method for traced calls.
for event in co.chat_stream(
model="command-r-plus",
message="Write a poem"
):
if event.event_type == "text-generation":
print(event.text, end="")RAG with Connectors
Connector usage is traced:
response = co.chat(
model="command-r-plus",
message="What's in my documents?",
connectors=[{"id": "web-search"}]
)Embeddings
Embedding calls are captured:
response = co.embed(
texts=["Hello world", "Goodbye world"],
model="embed-english-v3.0",
input_type="search_document"
)Captured Attributes
| Attribute | Description |
|---|---|
gen_ai.system | cohere |
gen_ai.request.model | Model name |
gen_ai.operation | chat, generate, or embed |
gen_ai.response.id | Generation ID |
gen_ai.usage.prompt_tokens | Input tokens |
gen_ai.usage.completion_tokens | Output tokens |
gen_ai.latency_ms | Request latency in milliseconds |
For embed calls, Risicare also captures:
| Attribute | Description |
|---|---|
gen_ai.input.count | Number of input texts |
Operation-Specific Attributes
Token usage (prompt_tokens, completion_tokens) is only captured for chat operations. The generate operation captures gen_ai.request.model and gen_ai.latency_ms. The embed operation captures gen_ai.input.count and gen_ai.latency_ms.
Cost Tracking
| Model | Input (per 1M) | Output (per 1M) |
|---|---|---|
| command-r-plus | $2.50 | $10.00 |
| command-r | $0.15 | $0.60 |
| command | $1.00 | $2.00 |
| embed-english-v3.0 | $0.10 | - |