Rollback

Instant rollback when fixes fail.

Coming Soon

Automatic rollback is under active development and not yet available in production.

Risicare provides instant rollback to protect your system from bad fixes.

Automatic Rollback

Fixes are automatically rolled back when:

Trigger	Threshold	Speed
Error rate increase	>10% relative	Instant
P99 latency increase	>2x baseline	Instant
A/B test fails	p < 0.05 (treatment worse)	Instant

Rollback Speed

Target: under 500ms

How it works:

Redis update (10ms): Update routing config
SDK notification (optional): Push invalidation
SDK poll (60s max): Regular refresh interval
Effective: Next request uses baseline

For critical rollbacks, push invalidation ensures instant effect.

Manual Rollback

Via Dashboard

Navigate to Healing -> Deployments
Find the deployment
Click "Rollback"
Confirm

Via API

Rollback a deployment by sending a DELETE request:

curl -X DELETE "https://app.risicare.ai/api/v1/deployments/{id}" \
  -H "Authorization: Bearer rsk-..."

Deployment API

Four endpoints manage the full deployment lifecycle:

Method	Endpoint	Description
`GET`	`/v1/deployments`	List all deployments
`GET`	`/v1/deployments/{id}`	Get deployment detail
`POST`	`/v1/deployments`	Create a new deployment
`DELETE`	`/v1/deployments/{id}`	Rollback a deployment

Deployment Management

All deployment state transitions (ramping, graduating) are handled automatically by the system based on statistical tests. There are no separate pause, resume, or graduate endpoints.

Deployment States

State	Description
`pending`	Deployment created, not yet started
`active`	Live and serving traffic
`ramping`	Traffic percentage increasing through stages
`graduated`	Fix reached 100% and held for 24 hours
`rolled_back`	Deployment reverted
`failed`	Unrecoverable error during deployment

Rollback Events

{
  "event": "rollback",
  "deployment_id": "deploy-abc123",
  "fix_id": "fix-xyz789",
  "timestamp": "2024-01-15T10:30:00Z",
  "trigger": "automatic",
  "reason": "error_rate_exceeded",
  "metrics": {
    "baseline_error_rate": 0.10,
    "treatment_error_rate": 0.15,
    "increase_percentage": 50
  },
  "duration_ms": 234
}

Rollback History

View rollback history:

Time	Fix	Trigger	Reason
10:30	fix-abc	Automatic	Error rate +50%
09:15	fix-xyz	Manual	Customer report
Yesterday	fix-123	Automatic	Latency 2.5x

Post-Rollback Analysis

After rollback:

Alert sent to team
Diagnosis triggered on new errors
Fix marked as failed
Learning recorded for future

Preventing Bad Deployments

Canary First

All fixes go through canary (5%) before wider rollout.

Gradual Ramp

5% -> 25% -> 50% -> 100%

Each stage requires passing a statistical A/B test.

Guardrails

Secondary metrics must not degrade even if primary improves.

Rollback Configuration

Customize rollback thresholds:

{
  "deployment_config": {
    "rollback_thresholds": {
      "error_rate_increase": 0.05,
      "latency_increase_factor": 1.25
    },
    "rollback_delay_seconds": 0,
    "require_manual_for_graduated": true
  }
}

Recovery After Rollback

To retry a rolled-back fix:

Analyze failure reason
Modify fix configuration
Create new fix version
Deploy from canary

Rolled-back fixes cannot be directly re-deployed.

Next Steps

Fix Types

Modify and retry

Learn more

Diagnosis

Understand failures

Learn more

Edit this page on GitHub

PreviousA/B Testing NextAdmin