Pipeline
The 4-stage diagnosis pipeline in detail.
Risicare's diagnosis pipeline processes errors through four stages.
Pipeline Overview
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Stage 1 │ │ Stage 2 │ │ Stage 3 │ │ Stage 4 │
│ Context │ ──▶ │ Taxonomy │ ──▶ │ Root Cause │ ──▶ │ Fix │
│ Extraction │ │ Classification │ │ Analysis │ │ Suggestion │
│ (~100ms) │ │ (~1s) │ │ (~2s) │ │ (~500ms) │
└─────────────────┘ └─────────────────┘ └─────────────────┘ └─────────────────┘
Stage 1: Context Extraction
Purpose
Gather all relevant information from the error trace.
Process
-
Identify error span
- Find span with
status: error - Extract error message and stack trace
- Find span with
-
Collect parent chain
- Walk up the span tree
- Include agent contexts
- Include phase contexts (think/decide/act)
-
Gather sibling spans
- Preceding spans (what happened before)
- Following spans (if any)
-
Extract content
- LLM prompts and completions
- Tool inputs and outputs
- Agent messages
Context Window Management
MAX_SPANS = 50
MAX_TOKENS = 100_000
# Priority order for inclusion:
# 1. Error span (always)
# 2. Direct parents (always)
# 3. LLM spans (high priority)
# 4. Tool spans with errors (high priority)
# 5. Recent sibling spans (medium priority)
# 6. Older context (low priority, summarized)Output
{
"error_span": { ... },
"parent_spans": [ ... ],
"context_spans": [ ... ],
"llm_content": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."}
],
"tool_io": [
{"tool": "search", "input": "...", "output": "..."}
],
"metadata": {
"agent": "researcher",
"phase": "act",
"iteration": 2
}
}Stage 2: Taxonomy Classification
Purpose
Map the error to a specific code in the error taxonomy (154 codes across 10 modules and 35+ categories).
Heuristic-First Classification
Before invoking an LLM, Risicare runs a pattern matcher with 381 regex and keyword rules against the error message, stack trace, and span attributes. Each rule maps to a specific error code and produces a confidence score.
- If the pattern matcher produces a confidence >= 0.6, the classification is accepted immediately (typically under 10ms).
- If confidence is < 0.6 or no pattern matches, the error is forwarded to the LLM classifier.
This heuristic-first approach handles the majority of common errors without an LLM call, significantly reducing latency and cost.
LLM Classifier (Fallback)
Model: gpt-4o-mini - Fast, cost-effective classification for errors that the pattern matcher cannot confidently classify.
Prompt Structure
System: You are an AI agent error classifier. Classify errors into this taxonomy:
[TAXONOMY DEFINITION - 10 modules, 35+ categories]
Rules:
- Choose the most specific code that fits
- Confidence must be 0.0-1.0
- Explain your reasoning briefly
User: [EXTRACTED CONTEXT]
Error: {error_message}
Stack trace: {stack_trace}
Agent: {agent_name}
Phase: {phase}
Output
{
"module": "TOOL",
"category": "EXECUTION",
"subcategory": "TIMEOUT",
"error_code": "TOOL.EXECUTION.TIMEOUT",
"confidence": 0.92,
"reasoning": "The error occurred during tool execution with a 30s timeout. The API call exceeded the configured timeout limit."
}Confidence Thresholds
| Confidence | Action |
|---|---|
| >= 0.8 | High confidence, proceed directly to Stage 3 |
| 0.6 - 0.8 | Medium confidence, proceed but flag for review |
| < 0.6 | Low confidence, escalate to Stage 3 for deeper analysis with gpt-4o |
Fix suggestion threshold
In Stage 4, suggested fixes require a minimum confidence of 0.5 to be included in the diagnosis output. Fixes below this threshold are discarded.
Stage 3: Root Cause Analysis
Purpose
Determine why the error actually occurred, not just what happened.
Model
gpt-4o - Deep reasoning capabilities
Analysis Framework
- Proximate cause: What directly caused the error?
- Contributing factors: What made it more likely?
- Underlying issues: What systemic problems exist?
- Prevention: How could this be avoided?
Prompt Structure
System: You are an expert debugger for AI agents. Analyze root causes deeply.
Consider:
- Was this a one-time issue or systemic?
- What assumptions failed?
- What dependencies failed?
- What could prevent recurrence?
User:
Classification: {error_code}
Context: {extracted_context}
Output
{
"root_cause": {
"summary": "External API timeout due to unexpectedly large payload",
"proximate_cause": "API call exceeded 30s timeout",
"contributing_factors": [
"Payload size (2.5MB) was 25x larger than typical",
"No retry logic configured",
"No payload size validation before API call"
],
"underlying_issues": [
"Missing input validation for tool arguments",
"No circuit breaker for external dependencies"
],
"severity": "medium",
"frequency_estimate": "likely_recurring"
}
}Stage 4: Fix Suggestion
Purpose
Recommend actionable fixes ranked by confidence.
Process
-
Knowledge base lookup
- Search for similar error patterns
- Find proven fixes for this error code
- Match by embedding similarity (threshold: 0.85)
-
Fix generation (if no match)
- Generate fix based on root cause
- Use fix templates for the error type
- Validate fix configuration
-
Ranking
- Score by historical success rate
- Score by root cause alignment
- Score by implementation complexity
Fix Types
| Type | Description |
|---|---|
prompt | Modify system prompt |
parameter | Adjust LLM parameters |
tool | Fix tool configuration |
retry | Add retry logic |
fallback | Add fallback strategy |
guard | Add validation |
routing | Change agent routing |
Output
{
"suggested_fixes": [
{
"fix_type": "retry",
"confidence": 0.85,
"description": "Add exponential backoff retry for timeout errors",
"config": {
"max_retries": 3,
"initial_delay_ms": 1000,
"exponential_base": 2.0,
"max_delay_ms": 30000,
"jitter": true,
"retry_on": ["TOOL.EXECUTION.TIMEOUT"]
},
"evidence": "This fix resolved 78% of similar timeout errors"
}
]
}Pipeline Monitoring
Metrics
| Metric | Description |
|---|---|
diagnosis_latency_ms | Total pipeline time |
stage_1_latency_ms | Context extraction time |
stage_2_latency_ms | Classification time |
stage_3_latency_ms | Root cause analysis time |
stage_4_latency_ms | Fix suggestion time |
classification_confidence | Stage 2 confidence |
knowledge_base_hit_rate | Stage 4 cache hits |
Error Handling
If any stage fails:
- Log error with context
- Return partial diagnosis
- Mark diagnosis as incomplete
- Queue for retry (max 3 attempts)