Overview
How Risicare's self-healing pipeline works.
Risicare automatically generates, validates, and deploys fixes for diagnosed errors.
Self-Healing Pipeline
Diagnosis → Hypothesis Generation → Validation → Deployment → Learning
↓ ↓ ↓ ↓
Generate fix ideas Test each one Canary → A/B Store pattern
How It Works
1. Receive Diagnosis
When an error is diagnosed, the healing pipeline receives:
- Error code (e.g.,
TOOL.EXECUTION.TIMEOUT) - Root cause analysis
- Context from the error trace
- Similar past errors (if any)
2. Generate Hypotheses
Create testable hypotheses about what might fix the issue:
Diagnosis: TOOL.EXECUTION.TIMEOUT on weather_api
Hypothesis 1: Retry with backoff (0.75 prior)
Hypothesis 2: Increase timeout (0.60 prior)
Hypothesis 3: Add fallback (0.55 prior)
3. Validate Statistically
Each hypothesis is tested:
- A/B Test: Split traffic between baseline and fix
- Measure: Error rate, latency, cost
- Analyze: Statistical significance (p < 0.05)
- Decide: Accept, reject, or continue testing
4. Deploy Safely
Validated fixes are deployed progressively:
Canary (5%) → Ramp (25%) → Ramp (50%) → Graduate (100%)
With automatic rollback if:
- Error rate increases >10%
- Latency exceeds 2x baseline
- Manual intervention
5. Learn
Successful fixes become knowledge:
- Store error pattern as embedding
- Create fix template
- Share across customers (federated)
- Improve future suggestions
Fix Types
Risicare generates 7 types of fixes:
| Type | What It Does |
|---|---|
| Prompt | Modify system prompt |
| Parameter | Adjust LLM settings |
| Tool | Fix tool configuration |
| Retry | Add retry with backoff |
| Fallback | Use alternative strategy |
| Guard | Add validation |
| Routing | Change agent delegation |
No Code Injection
Declarative Fixes
Fixes are JSON configurations, not code. The SDK interprets these at runtime. Risicare never injects code into your system.
Example fix:
{
"fix_id": "fix-abc123",
"fix_type": "retry",
"config": {
"max_retries": 3,
"initial_delay_ms": 1000,
"exponential_base": 2.0,
"max_delay_ms": 30000,
"jitter": true,
"retry_on": []
}
}Confidence Levels
Fixes have confidence scores:
| Confidence | Meaning |
|---|---|
| > 0.8 | High - auto-deploy to canary |
| 0.6 - 0.8 | Medium - require approval |
| < 0.6 | Low - suggest only |
Dashboard
View healing activity:
- Active Fixes: Currently deployed fixes
- Testing: Fixes in A/B testing
- Candidates: Suggested but not deployed
- Graduated: Successfully deployed
- Rolled Back: Failed fixes
Metrics
| Metric | Description |
|---|---|
| Fix Rate | % of errors with deployed fixes |
| Success Rate | % of fixes that graduate |
| MTTR | Mean time to remediation |
| Error Reduction | % error reduction from fixes |