Overview

How Risicare's self-healing pipeline works.

Risicare automatically detects errors, diagnoses root causes, and generates fixes. Fix deployment is available as an opt-in feature.

Self-Healing Pipeline

Self-healing pipeline: Error → Diagnosis → Fix Generation → Canary Deploy → A/B Testing → Graduate

Diagnosis → Hypothesis Generation → Validation → Deployment → Learning
             ↓                        ↓            ↓           ↓
         Generate fix ideas     Test each one   Canary → A/B  Store pattern

How It Works

1. Receive Diagnosis

When an error is diagnosed, the healing pipeline receives:

Error code (e.g., TOOL.EXECUTION.TIMEOUT)
Root cause analysis
Context from the error trace
Similar past errors (if any)

2. Generate Hypotheses

Create testable hypotheses about what might fix the issue:

Diagnosis: TOOL.EXECUTION.TIMEOUT on weather_api

Hypothesis 1: Retry with backoff (0.75 prior)
Hypothesis 2: Increase timeout (0.60 prior)
Hypothesis 3: Add fallback (0.55 prior)

3. Validate Statistically

Each hypothesis is tested:

A/B Test: Split traffic between baseline and fix
Measure: Error rate, latency, cost
Analyze: Statistical significance (p < 0.05)
Decide: Accept, reject, or continue testing

4. Deploy Safely

Deployment is opt-in

Fix deployment requires auto_fix_enabled to be turned on in your project settings. By default, the pipeline generates and validates fixes but does not deploy them automatically. You can review generated fixes in the dashboard before enabling auto-deployment.

Validated fixes are deployed progressively:

Canary (5%) → Ramp (25%) → Ramp (50%) → Graduate (100%)

With automatic rollback if:

Error rate increases >10%
Latency exceeds 2x baseline
Manual intervention

5. Learn

Successful fixes become knowledge:

Store error pattern as embedding
Create fix template
Share across customers (federated)
Improve future suggestions

Fix Types

Risicare generates 7 types of fixes:

Type	What It Does
Prompt	Modify system prompt
Parameter	Adjust LLM settings
Tool	Fix tool configuration
Retry	Add retry with backoff
Fallback	Use alternative strategy
Guard	Add validation
Routing	Change agent delegation

No Code Injection

Declarative Fixes

Fixes are JSON configurations, not code. The Python SDK's Fix Runtime interprets these at runtime. Risicare never injects code into your system. The JavaScript SDK does not yet include Fix Runtime — fixes must be applied manually.

Example fix:

{
  "fix_id": "fix-abc123",
  "fix_type": "retry",
  "config": {
    "max_retries": 3,
    "initial_delay_ms": 1000,
    "exponential_base": 2.0,
    "max_delay_ms": 30000,
    "jitter": true,
    "retry_on": []
  }
}

Confidence Levels

Fixes have confidence scores:

Confidence	Meaning
> 0.8	High - auto-deploy to canary
0.6 - 0.8	Medium - require approval
< 0.6	Low - suggest only

Dashboard

View healing activity:

Active Fixes: Currently deployed fixes
Testing: Fixes in A/B testing
Candidates: Suggested but not deployed
Graduated: Successfully deployed
Rolled Back: Failed fixes

Metrics

Metric	Description
Fix Rate	% of errors with deployed fixes
Success Rate	% of fixes that graduate
MTTR	Mean time to remediation
Error Reduction	% error reduction from fixes

Next Steps

Hypothesis Testing

DoVer methodology details

Learn more

Fix Types

All 7 fix types explained

Learn more

Edit this page on GitHub

PreviousHeal NextHypothesis Testing