Groq
Auto-instrument Groq for ultra-fast inference.
Risicare automatically instruments the Groq SDK for ultra-low latency LLM inference.
Installation
pip install risicare groqAuto-Instrumentation
import risicare
from groq import Groq
risicare.init()
client = Groq()
# Automatically traced
response = client.chat.completions.create(
model="llama-3.1-70b-versatile",
messages=[{"role": "user", "content": "Hello!"}]
)Captured Attributes
| Attribute | Description |
|---|---|
gen_ai.system | groq |
gen_ai.request.model | Requested model name |
gen_ai.response.model | Model name returned by API |
gen_ai.response.id | Response ID |
gen_ai.request.temperature | Sampling temperature |
gen_ai.request.max_tokens | Max output tokens |
gen_ai.request.stream | Whether streaming was requested |
gen_ai.request.has_tools | Whether tools were provided |
gen_ai.usage.prompt_tokens | Input tokens |
gen_ai.usage.completion_tokens | Output tokens |
gen_ai.usage.total_tokens | Total tokens |
gen_ai.completion.tool_calls | Number of tool calls made |
gen_ai.completion.finish_reason | Stop reason |
gen_ai.latency_ms | Request latency in milliseconds |
Streaming
stream = client.chat.completions.create(
model="llama-3.1-70b-versatile",
messages=[{"role": "user", "content": "Write a story"}],
stream=True
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")Supported Models
| Model | Description |
|---|---|
llama-3.1-405b-reasoning | Llama 3.1 405B |
llama-3.1-70b-versatile | Llama 3.1 70B |
llama-3.1-8b-instant | Llama 3.1 8B |
mixtral-8x7b-32768 | Mixtral 8x7B |
gemma2-9b-it | Gemma 2 9B |