Skip to main content
GitHub

Heal

Automatic fix generation and deployment for AI agent failures.

Risicare's self-healing pipeline automatically detects errors, diagnoses root causes, and generates fixes. Fix deployment via A/B testing is available as an opt-in feature.

Beyond observability

No other platform offers automated error diagnosis with a 154-code taxonomy, fix generation across 7 fix types, and statistical A/B deployment. While competitors stop at showing you the error, Risicare diagnoses why it happened and generates a fix.

Overview

The healing pipeline follows the DoVer methodology (Diagnosis via Observation of Verification):

  1. Generate Hypotheses - Create testable hypotheses about fixes
  2. Validate Statistically - Test fixes with A/B testing
  3. Deploy Safely - Canary release with automatic rollback

Fix Types

Risicare can generate 7 types of fixes:

TypeWhat It DoesExample
PromptModify system prompt or add few-shot examplesAdd clarifying instructions
ParameterAdjust LLM parametersLower temperature, increase max_tokens
ToolFix tool configurationAdd timeout, fix validation
RetryAdd retry logicExponential backoff on transient errors
FallbackUse alternative model/strategyFall back to gpt-4o-mini on timeout
GuardAdd input/output validationJSON schema validation
RoutingChange agent delegationRoute to different specialist agent

Fix Configuration

Fixes are JSON configurations, not code:

{
  "fix_id": "fix-abc123",
  "fix_type": "retry",
  "config": {
    "max_retries": 3,
    "initial_delay_ms": 1000,
    "exponential_base": 2.0,
    "max_delay_ms": 30000,
    "jitter": true,
    "retry_on": ["TimeoutError"]
  },
  "rollback_strategy": {
    "type": "immediate",
    "trigger": "error_rate > 0.1"
  }
}

No Code Injection

Fixes are declarative configurations applied by the SDK at runtime. Risicare never injects code into your system.

Hypothesis Testing

Before deployment, fixes are validated through hypothesis testing:

Generate Hypotheses

Diagnosis: TOOL.EXECUTION.TIMEOUT on weather_api

Hypothesis 1: Adding retry with backoff will reduce timeout errors
  Prior probability: 0.75 (based on similar patterns)

Hypothesis 2: Increasing timeout to 60s will reduce errors
  Prior probability: 0.60

Hypothesis 3: Adding fallback to cached data will maintain uptime
  Prior probability: 0.55

Statistical Validation

Each hypothesis is tested with:

  • Sample size calculation for statistical power (0.8)
  • Two-proportion z-test for significance (p < 0.05)
  • Bayesian updates to posterior probability
  • O'Brien-Fleming boundaries for early stopping
Test Results:
  Baseline error rate: 12.3%
  Treatment error rate: 2.1%
  Effect size (Cohen's h): 0.38
  P-value: 0.0023 ✓

  Decision: Hypothesis VALIDATED

Deployment Pipeline

Fix Created
     ↓
┌─────────────────┐
│ Canary (5%)     │  Minimum 100 samples
│                 │  Monitor error rate
└─────────────────┘
     ↓ (if passing)
┌─────────────────┐
│ Ramp (25%)      │  Statistical A/B test
│                 │  O'Brien-Fleming boundaries
└─────────────────┘
     ↓ (if winning)
┌─────────────────┐
│ Ramp (50%)      │  Continue testing
│                 │
└─────────────────┘
     ↓ (if winning)
┌─────────────────┐
│ Graduate (100%) │  Hold for 24 hours
│                 │  Mark as graduated
└─────────────────┘

Automatic Rollback

Fixes are automatically rolled back if:

  • Error rate increases >10% vs baseline
  • P99 latency exceeds 2x baseline
  • Manual rollback triggered

Rollback latency target: under 500ms (Redis routing update)

Fix Runtime

The SDK includes a fix runtime that:

  1. Loads fixes from the API on startup
  2. Caches locally with periodic refresh
  3. Routes requests based on A/B assignment
  4. Applies fixes at LLM call time
# Fix runtime is automatic when using the SDK
import risicare
 
risicare.init()
 
# Fixes are applied automatically to LLM calls
response = client.chat.completions.create(...)

Knowledge Base

Successful fixes are stored in a knowledge base:

  • Error patterns as embeddings (pgvector)
  • Fix templates with parameters
  • Cross-customer learning (federated, no raw data)
  • Similarity threshold: 0.85

When a new error occurs, the knowledge base is checked first before generating a new fix.

Next Steps