Skip to main content
GitHub

Groq

Auto-instrument Groq for ultra-fast inference.

Risicare automatically instruments the Groq SDK for ultra-low latency LLM inference.

Installation

pip install risicare groq

Auto-Instrumentation

import risicare
from groq import Groq
 
risicare.init()
 
client = Groq()
 
# Automatically traced
response = client.chat.completions.create(
    model="llama-3.1-70b-versatile",
    messages=[{"role": "user", "content": "Hello!"}]
)

Captured Attributes

AttributeDescription
gen_ai.systemgroq
gen_ai.request.modelRequested model name
gen_ai.response.modelModel name returned by API
gen_ai.response.idResponse ID
gen_ai.request.temperatureSampling temperature
gen_ai.request.max_tokensMax output tokens
gen_ai.request.streamWhether streaming was requested
gen_ai.request.has_toolsWhether tools were provided
gen_ai.usage.prompt_tokensInput tokens
gen_ai.usage.completion_tokensOutput tokens
gen_ai.usage.total_tokensTotal tokens
gen_ai.completion.tool_callsNumber of tool calls made
gen_ai.completion.finish_reasonStop reason
gen_ai.latency_msRequest latency in milliseconds

Streaming

stream = client.chat.completions.create(
    model="llama-3.1-70b-versatile",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)
 
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Supported Models

ModelDescription
llama-3.1-405b-reasoningLlama 3.1 405B
llama-3.1-70b-versatileLlama 3.1 70B
llama-3.1-8b-instantLlama 3.1 8B
mixtral-8x7b-32768Mixtral 8x7B
gemma2-9b-itGemma 2 9B

Next Steps