Rollback
Instant rollback when fixes fail.
Coming Soon
Automatic rollback is under active development and not yet available in production.
Risicare provides instant rollback to protect your system from bad fixes.
Automatic Rollback
Fixes are automatically rolled back when:
| Trigger | Threshold | Speed |
|---|---|---|
| Error rate increase | >10% relative | Instant |
| P99 latency increase | >2x baseline | Instant |
| A/B test fails | p < 0.05 (treatment worse) | Instant |
Rollback Speed
Target: under 500ms
How it works:
- Redis update (10ms): Update routing config
- SDK notification (optional): Push invalidation
- SDK poll (60s max): Regular refresh interval
- Effective: Next request uses baseline
For critical rollbacks, push invalidation ensures instant effect.
Manual Rollback
Via Dashboard
- Navigate to Healing -> Deployments
- Find the deployment
- Click "Rollback"
- Confirm
Via API
Rollback a deployment by sending a DELETE request:
curl -X DELETE "https://app.risicare.ai/api/v1/deployments/{id}" \
-H "Authorization: Bearer rsk-..."Deployment API
Four endpoints manage the full deployment lifecycle:
| Method | Endpoint | Description |
|---|---|---|
GET | /v1/deployments | List all deployments |
GET | /v1/deployments/{id} | Get deployment detail |
POST | /v1/deployments | Create a new deployment |
DELETE | /v1/deployments/{id} | Rollback a deployment |
Deployment Management
All deployment state transitions (ramping, graduating) are handled automatically by the system based on statistical tests. There are no separate pause, resume, or graduate endpoints.
Deployment States
| State | Description |
|---|---|
pending | Deployment created, not yet started |
active | Live and serving traffic |
ramping | Traffic percentage increasing through stages |
graduated | Fix reached 100% and held for 24 hours |
rolled_back | Deployment reverted |
failed | Unrecoverable error during deployment |
Rollback Events
{
"event": "rollback",
"deployment_id": "deploy-abc123",
"fix_id": "fix-xyz789",
"timestamp": "2024-01-15T10:30:00Z",
"trigger": "automatic",
"reason": "error_rate_exceeded",
"metrics": {
"baseline_error_rate": 0.10,
"treatment_error_rate": 0.15,
"increase_percentage": 50
},
"duration_ms": 234
}Rollback History
View rollback history:
| Time | Fix | Trigger | Reason |
|---|---|---|---|
| 10:30 | fix-abc | Automatic | Error rate +50% |
| 09:15 | fix-xyz | Manual | Customer report |
| Yesterday | fix-123 | Automatic | Latency 2.5x |
Post-Rollback Analysis
After rollback:
- Alert sent to team
- Diagnosis triggered on new errors
- Fix marked as failed
- Learning recorded for future
Preventing Bad Deployments
Canary First
All fixes go through canary (5%) before wider rollout.
Gradual Ramp
5% -> 25% -> 50% -> 100%
Each stage requires passing a statistical A/B test.
Guardrails
Secondary metrics must not degrade even if primary improves.
Rollback Configuration
Customize rollback thresholds:
{
"deployment_config": {
"rollback_thresholds": {
"error_rate_increase": 0.05,
"latency_increase_factor": 1.25
},
"rollback_delay_seconds": 0,
"require_manual_for_graduated": true
}
}Recovery After Rollback
To retry a rolled-back fix:
- Analyze failure reason
- Modify fix configuration
- Create new fix version
- Deploy from canary
Rolled-back fixes cannot be directly re-deployed.