Overview
How Risicare's self-healing pipeline works.
Risicare automatically detects errors, diagnoses root causes, and generates fixes. Fix deployment is available as an opt-in feature.
Self-Healing Pipeline
Diagnosis → Hypothesis Generation → Validation → Deployment → Learning
↓ ↓ ↓ ↓
Generate fix ideas Test each one Canary → A/B Store pattern
How It Works
1. Receive Diagnosis
When an error is diagnosed, the healing pipeline receives:
- Error code (e.g.,
TOOL.EXECUTION.TIMEOUT) - Root cause analysis
- Context from the error trace
- Similar past errors (if any)
2. Generate Hypotheses
Create testable hypotheses about what might fix the issue:
Diagnosis: TOOL.EXECUTION.TIMEOUT on weather_api
Hypothesis 1: Retry with backoff (0.75 prior)
Hypothesis 2: Increase timeout (0.60 prior)
Hypothesis 3: Add fallback (0.55 prior)
3. Validate Statistically
Each hypothesis is tested:
- A/B Test: Split traffic between baseline and fix
- Measure: Error rate, latency, cost
- Analyze: Statistical significance (p < 0.05)
- Decide: Accept, reject, or continue testing
4. Deploy Safely
Deployment is opt-in
Fix deployment requires auto_fix_enabled to be turned on in your project settings. By default, the pipeline generates and validates fixes but does not deploy them automatically. You can review generated fixes in the dashboard before enabling auto-deployment.
Validated fixes are deployed progressively:
Canary (5%) → Ramp (25%) → Ramp (50%) → Graduate (100%)
With automatic rollback if:
- Error rate increases >10%
- Latency exceeds 2x baseline
- Manual intervention
5. Learn
Successful fixes become knowledge:
- Store error pattern as embedding
- Create fix template
- Share across customers (federated)
- Improve future suggestions
Fix Types
Risicare generates 7 types of fixes:
| Type | What It Does |
|---|---|
| Prompt | Modify system prompt |
| Parameter | Adjust LLM settings |
| Tool | Fix tool configuration |
| Retry | Add retry with backoff |
| Fallback | Use alternative strategy |
| Guard | Add validation |
| Routing | Change agent delegation |
No Code Injection
Declarative Fixes
Fixes are JSON configurations, not code. The Python SDK's Fix Runtime interprets these at runtime. Risicare never injects code into your system. The JavaScript SDK does not yet include Fix Runtime — fixes must be applied manually.
Example fix:
{
"fix_id": "fix-abc123",
"fix_type": "retry",
"config": {
"max_retries": 3,
"initial_delay_ms": 1000,
"exponential_base": 2.0,
"max_delay_ms": 30000,
"jitter": true,
"retry_on": []
}
}Confidence Levels
Fixes have confidence scores:
| Confidence | Meaning |
|---|---|
| > 0.8 | High - auto-deploy to canary |
| 0.6 - 0.8 | Medium - require approval |
| < 0.6 | Low - suggest only |
Dashboard
View healing activity:
- Active Fixes: Currently deployed fixes
- Testing: Fixes in A/B testing
- Candidates: Suggested but not deployed
- Graduated: Successfully deployed
- Rolled Back: Failed fixes
Metrics
| Metric | Description |
|---|---|
| Fix Rate | % of errors with deployed fixes |
| Success Rate | % of fixes that graduate |
| MTTR | Mean time to remediation |
| Error Reduction | % error reduction from fixes |