Scorers
Built-in and custom scoring for LLM evaluation.
Risicare provides two ways to score your traces:
- Built-in scorers — 13 pre-configured LLM-based evaluators that run server-side when you trigger an evaluation
- Custom scores — Use
risicare.score()to record any metric from your own code
Custom Scores with risicare.score()
The simplest way to add scores to your traces. No extra packages needed — it's built into the SDK you already have.
import risicare
risicare.init(api_key="rsk-your-api-key")
# Score a trace with any custom metric
risicare.score(
trace_id="trace-abc123",
name="sql_valid",
value=1.0,
comment="Query executed without errors"
)JavaScript / TypeScript:
import { init, score } from 'risicare';
init({ apiKey: 'rsk-your-api-key' });
score('trace-abc123', 'sql_valid', 1.0, {
comment: 'Query executed without errors',
});Scoring Inside a Trace
import risicare
@risicare.trace
def my_pipeline(query):
result = llm.invoke(query)
# Score this trace based on custom logic
trace_id = risicare.get_current_trace_id()
if trace_id:
is_valid = validate_output(result)
risicare.score(
trace_id=trace_id,
name="output_valid",
value=1.0 if is_valid else 0.0
)
return resultParameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
trace_id | str | Yes | — | The trace to score |
name | str | Yes | — | Score name (e.g., "accuracy", "user_satisfaction") |
value | float | Yes | — | Score value |
span_id | str | No | null | Specific span within the trace |
comment | str | No | null | Human-readable explanation |
Scoring via REST API
You can also create scores via HTTP:
curl -X POST "https://app.risicare.ai/api/v1/scores" \
-H "Authorization: Bearer rsk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"trace_id": "trace-abc123",
"name": "accuracy",
"score": 0.95,
"comment": "Response matched expected output",
"source": "api"
}'Built-in Scorers

When you create an evaluation via the API or dashboard, you specify which scorers to run using the criteria field. The Risicare server runs these scorers automatically — you don't need to install any extra packages.
Triggering Built-in Scorers
curl -X POST "https://app.risicare.ai/api/v1/evaluations" \
-H "Authorization: Bearer rsk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"name": "Quality check",
"evaluation_type": "llm_judge",
"trace_ids": ["trace-abc123"],
"criteria": ["faithfulness", "toxicity"]
}'Or from the dashboard: Evaluations → New Evaluation, select traces, and choose scorers.
Server-side execution
Built-in scorers run on the Risicare server using LLM-as-judge. You don't need to install any additional packages or provide your own LLM API key for built-in scorers. Evaluations are queued (HTTP 202) and processed asynchronously by a worker.
Fully Verified (10 scorers)
These scorers work immediately with standard trace data:
| Scorer | Category | What it evaluates | Score direction |
|---|---|---|---|
faithfulness | RAG | Is the answer grounded in the provided context? | Higher is better |
answer_relevancy | RAG | Does the answer address the question? | Higher is better |
context_precision | RAG | Is the retrieved context relevant? | Higher is better |
hallucination | RAG | Does the answer contain fabricated claims? | Lower is better |
toxicity | Safety | Is the content toxic, harmful, or offensive? | Lower is better |
bias | Safety | Does the output show demographic or cultural bias? | Lower is better |
pii_leakage | Safety | Does the output leak personal identifiable information? | Lower is better |
task_completion | Agent | Did the agent complete the requested task? | Higher is better |
tool_correctness | Agent | Were the right tools used with correct parameters? | Higher is better |
factuality | General | Are factual claims in the output accurate? | Higher is better |
Require Additional Configuration (3 scorers)
These scorers work correctly but need specific input fields:
| Scorer | Category | Requires | Why |
|---|---|---|---|
context_recall | RAG | ground_truth field in trace data | Compares output against a reference answer |
goal_accuracy | Agent | goal field in evaluation config | Measures whether agent achieved a specific goal |
g_eval | General | Custom criteria in scorer config | Configurable evaluation framework — needs user-defined criteria |