HuggingFace
Auto-instrument HuggingFace Inference API.
Risicare automatically instruments the HuggingFace Inference API.
Installation
pip install risicare huggingface_hubAuto-Instrumentation
import risicare
from huggingface_hub import InferenceClient
risicare.init()
client = InferenceClient()
# Automatically traced
response = client.text_generation(
"Hello, how are you?",
model="meta-llama/Llama-3-70B-Instruct"
)Captured Attributes
Chat Completion
| Attribute | Description |
|---|---|
gen_ai.system | huggingface |
gen_ai.request.model | Model name/ID |
gen_ai.response.model | Model name returned by API |
gen_ai.response.id | Response ID |
gen_ai.request.temperature | Sampling temperature |
gen_ai.request.max_tokens | Max output tokens |
gen_ai.request.stream | Whether streaming was requested |
gen_ai.request.has_tools | Whether tools were provided |
gen_ai.usage.prompt_tokens | Input tokens |
gen_ai.usage.completion_tokens | Output tokens |
gen_ai.usage.total_tokens | Total tokens |
gen_ai.completion.tool_calls | Number of tool calls made |
gen_ai.completion.finish_reason | Stop reason |
gen_ai.latency_ms | Request latency in milliseconds |
Text Generation
| Attribute | Description |
|---|---|
gen_ai.system | huggingface |
gen_ai.request.model | Model name/ID |
gen_ai.request.stream | Whether streaming was requested |
gen_ai.request.temperature | Sampling temperature |
gen_ai.request.max_tokens | Max output tokens (from max_new_tokens) |
gen_ai.completion.content | Generated text content |
gen_ai.completion.finish_reason | Stop reason |
gen_ai.usage.completion_tokens | Output tokens (from generated_tokens) |
gen_ai.latency_ms | Request latency in milliseconds |
Model Attribute
gen_ai.response.model is only captured for chat_completion calls, not text_generation.
Chat Completions
response = client.chat_completion(
messages=[{"role": "user", "content": "Hello!"}],
model="meta-llama/Llama-3-70B-Instruct",
max_tokens=500
)Streaming
stream = client.text_generation(
"Write a story",
model="meta-llama/Llama-3-70B-Instruct",
stream=True
)
for chunk in stream:
print(chunk, end="")Async Support
from huggingface_hub import AsyncInferenceClient
client = AsyncInferenceClient()
response = await client.text_generation(
"Hello!",
model="meta-llama/Llama-3-70B-Instruct"
)Popular Models
| Model | Task |
|---|---|
meta-llama/Llama-3-70B-Instruct | Text Generation |
mistralai/Mistral-7B-Instruct-v0.3 | Text Generation |
sentence-transformers/all-MiniLM-L6-v2 | Embeddings |
stabilityai/stable-diffusion-xl-base-1.0 | Image Generation |