Skip to main content
GitHub

Pipeline

The 4-stage diagnosis pipeline in detail.

Risicare's diagnosis pipeline processes errors through four stages.

Pipeline Overview

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│ Stage 1         │     │ Stage 2         │     │ Stage 3         │     │ Stage 4         │
│ Context         │ ──▶ │ Taxonomy        │ ──▶ │ Root Cause      │ ──▶ │ Fix             │
│ Extraction      │     │ Classification  │     │ Analysis        │     │ Suggestion      │
│ (~100ms)        │     │ (~1s)           │     │ (~2s)           │     │ (~500ms)        │
└─────────────────┘     └─────────────────┘     └─────────────────┘     └─────────────────┘

Stage 1: Context Extraction

Purpose

Gather all relevant information from the error trace.

Process

  1. Identify error span

    • Find span with status: error
    • Extract error message and stack trace
  2. Collect parent chain

    • Walk up the span tree
    • Include agent contexts
    • Include phase contexts (think/decide/act)
  3. Gather sibling spans

    • Preceding spans (what happened before)
    • Following spans (if any)
  4. Extract content

    • LLM prompts and completions
    • Tool inputs and outputs
    • Agent messages

Context Window Management

MAX_SPANS = 50
MAX_TOKENS = 100_000
 
# Priority order for inclusion:
# 1. Error span (always)
# 2. Direct parents (always)
# 3. LLM spans (high priority)
# 4. Tool spans with errors (high priority)
# 5. Recent sibling spans (medium priority)
# 6. Older context (low priority, summarized)

Output

{
  "error_span": { ... },
  "parent_spans": [ ... ],
  "context_spans": [ ... ],
  "llm_content": [
    {"role": "system", "content": "..."},
    {"role": "user", "content": "..."},
    {"role": "assistant", "content": "..."}
  ],
  "tool_io": [
    {"tool": "search", "input": "...", "output": "..."}
  ],
  "metadata": {
    "agent": "researcher",
    "phase": "act",
    "iteration": 2
  }
}

Stage 2: Taxonomy Classification

Purpose

Map the error to a specific code in the error taxonomy (154 codes across 10 modules and 35+ categories).

Heuristic-First Classification

Before invoking an LLM, Risicare runs a pattern matcher with 381 regex and keyword rules against the error message, stack trace, and span attributes. Each rule maps to a specific error code and produces a confidence score.

  • If the pattern matcher produces a confidence >= 0.6, the classification is accepted immediately (typically under 10ms).
  • If confidence is < 0.6 or no pattern matches, the error is forwarded to the LLM classifier.

This heuristic-first approach handles the majority of common errors without an LLM call, significantly reducing latency and cost.

LLM Classifier (Fallback)

Model: gpt-4o-mini - Fast, cost-effective classification for errors that the pattern matcher cannot confidently classify.

Prompt Structure

System: You are an AI agent error classifier. Classify errors into this taxonomy:

[TAXONOMY DEFINITION - 10 modules, 35+ categories]

Rules:
- Choose the most specific code that fits
- Confidence must be 0.0-1.0
- Explain your reasoning briefly

User: [EXTRACTED CONTEXT]

Error: {error_message}
Stack trace: {stack_trace}
Agent: {agent_name}
Phase: {phase}

Output

{
  "module": "TOOL",
  "category": "EXECUTION",
  "subcategory": "TIMEOUT",
  "error_code": "TOOL.EXECUTION.TIMEOUT",
  "confidence": 0.92,
  "reasoning": "The error occurred during tool execution with a 30s timeout. The API call exceeded the configured timeout limit."
}

Confidence Thresholds

ConfidenceAction
>= 0.8High confidence, proceed directly to Stage 3
0.6 - 0.8Medium confidence, proceed but flag for review
< 0.6Low confidence, escalate to Stage 3 for deeper analysis with gpt-4o

Fix suggestion threshold

In Stage 4, suggested fixes require a minimum confidence of 0.5 to be included in the diagnosis output. Fixes below this threshold are discarded.

Stage 3: Root Cause Analysis

Purpose

Determine why the error actually occurred, not just what happened.

Model

gpt-4o - Deep reasoning capabilities

Analysis Framework

  1. Proximate cause: What directly caused the error?
  2. Contributing factors: What made it more likely?
  3. Underlying issues: What systemic problems exist?
  4. Prevention: How could this be avoided?

Prompt Structure

System: You are an expert debugger for AI agents. Analyze root causes deeply.

Consider:
- Was this a one-time issue or systemic?
- What assumptions failed?
- What dependencies failed?
- What could prevent recurrence?

User:
Classification: {error_code}
Context: {extracted_context}

Output

{
  "root_cause": {
    "summary": "External API timeout due to unexpectedly large payload",
    "proximate_cause": "API call exceeded 30s timeout",
    "contributing_factors": [
      "Payload size (2.5MB) was 25x larger than typical",
      "No retry logic configured",
      "No payload size validation before API call"
    ],
    "underlying_issues": [
      "Missing input validation for tool arguments",
      "No circuit breaker for external dependencies"
    ],
    "severity": "medium",
    "frequency_estimate": "likely_recurring"
  }
}

Stage 4: Fix Suggestion

Purpose

Recommend actionable fixes ranked by confidence.

Process

  1. Knowledge base lookup

    • Search for similar error patterns
    • Find proven fixes for this error code
    • Match by embedding similarity (threshold: 0.85)
  2. Fix generation (if no match)

    • Generate fix based on root cause
    • Use fix templates for the error type
    • Validate fix configuration
  3. Ranking

    • Score by historical success rate
    • Score by root cause alignment
    • Score by implementation complexity

Fix Types

TypeDescription
promptModify system prompt
parameterAdjust LLM parameters
toolFix tool configuration
retryAdd retry logic
fallbackAdd fallback strategy
guardAdd validation
routingChange agent routing

Output

{
  "suggested_fixes": [
    {
      "fix_type": "retry",
      "confidence": 0.85,
      "description": "Add exponential backoff retry for timeout errors",
      "config": {
        "max_retries": 3,
        "initial_delay_ms": 1000,
        "exponential_base": 2.0,
        "max_delay_ms": 30000,
        "jitter": true,
        "retry_on": ["TOOL.EXECUTION.TIMEOUT"]
      },
      "evidence": "This fix resolved 78% of similar timeout errors"
    }
  ]
}

Pipeline Monitoring

Metrics

MetricDescription
diagnosis_latency_msTotal pipeline time
stage_1_latency_msContext extraction time
stage_2_latency_msClassification time
stage_3_latency_msRoot cause analysis time
stage_4_latency_msFix suggestion time
classification_confidenceStage 2 confidence
knowledge_base_hit_rateStage 4 cache hits

Error Handling

If any stage fails:

  • Log error with context
  • Return partial diagnosis
  • Mark diagnosis as incomplete
  • Queue for retry (max 3 attempts)

Next Steps