Rollback
Instant rollback when fixes fail.
Risicare provides instant rollback to protect your system from bad fixes.
Automatic Rollback
Fixes are automatically rolled back when:
| Trigger | Threshold | Speed |
|---|---|---|
| Error rate increase | >10% relative | Instant |
| P99 latency increase | >2x baseline | Instant |
| A/B test fails | p < 0.05 (treatment worse) | Instant |
Rollback Speed
Target: under 500ms
How it works:
- Redis update (10ms): Update routing config
- SDK notification (optional): Push invalidation
- SDK poll (60s max): Regular refresh interval
- Effective: Next request uses baseline
For critical rollbacks, push invalidation ensures instant effect.
Manual Rollback
Via Dashboard
- Navigate to Healing -> Deployments
- Find the deployment
- Click "Rollback"
- Confirm
Via API
Rollback a deployment by sending a DELETE request:
curl -X DELETE "https://app.risicare.ai/v1/deployments/{id}" \
-H "Authorization: Bearer rsk-..."Deployment API
Four endpoints manage the full deployment lifecycle:
| Method | Endpoint | Description |
|---|---|---|
GET | /v1/deployments | List all deployments |
GET | /v1/deployments/{id} | Get deployment detail |
POST | /v1/deployments | Create a new deployment |
DELETE | /v1/deployments/{id} | Rollback a deployment |
Deployment Management
All deployment state transitions (ramping, graduating) are handled automatically by the system based on statistical tests. There are no separate pause, resume, or graduate endpoints.
Deployment States
| State | Description |
|---|---|
pending | Deployment created, not yet started |
active | Live and serving traffic |
ramping | Traffic percentage increasing through stages |
graduated | Fix reached 100% and held for 24 hours |
rolled_back | Deployment reverted |
failed | Unrecoverable error during deployment |
Rollback Events
{
"event": "rollback",
"deployment_id": "deploy-abc123",
"fix_id": "fix-xyz789",
"timestamp": "2024-01-15T10:30:00Z",
"trigger": "automatic",
"reason": "error_rate_exceeded",
"metrics": {
"baseline_error_rate": 0.10,
"treatment_error_rate": 0.15,
"increase_percentage": 50
},
"duration_ms": 234
}Rollback History
View rollback history:
| Time | Fix | Trigger | Reason |
|---|---|---|---|
| 10:30 | fix-abc | Automatic | Error rate +50% |
| 09:15 | fix-xyz | Manual | Customer report |
| Yesterday | fix-123 | Automatic | Latency 2.5x |
Post-Rollback Analysis
After rollback:
- Alert sent to team
- Diagnosis triggered on new errors
- Fix marked as failed
- Learning recorded for future
Preventing Bad Deployments
Canary First
All fixes go through canary (5%) before wider rollout.
Gradual Ramp
5% -> 25% -> 50% -> 100%
Each stage requires passing a statistical A/B test.
Guardrails
Secondary metrics must not degrade even if primary improves.
Rollback Configuration
Customize rollback thresholds:
{
"deployment_config": {
"rollback_thresholds": {
"error_rate_increase": 0.05,
"latency_increase_factor": 1.25
},
"rollback_delay_seconds": 0,
"require_manual_for_graduated": true
}
}Recovery After Rollback
To retry a rolled-back fix:
- Analyze failure reason
- Modify fix configuration
- Create new fix version
- Deploy from canary
Rolled-back fixes cannot be directly re-deployed.