Canary

Initial 5% rollout for new fixes.

Coming Soon

Canary deployment is under active development and not yet available in production. The design below describes the planned functionality.

Canary deployment tests fixes on a small percentage of traffic before wider rollout.

How It Works

Deploy to 5% of matching traffic
Collect minimum 100 samples
Compare error rate to baseline
Decide to proceed or rollback

Ramp Stages

Each fix goes through 4 stages, and each stage requires passing a statistical test before advancing:

5% -> 25% -> 50% -> 100%

Stage	Traffic	Requirement
Canary	5%	Minimum 100 samples, error rate not worse
Ramp 1	25%	Statistical A/B test passes (p < 0.05)
Ramp 2	50%	Statistical A/B test continues passing
Graduate	100%	Hold for 24 hours, then mark as graduated

Canary Configuration

{
  "deployment_id": "deploy-abc123",
  "fix_id": "fix-xyz789",
  "canary": {
    "traffic_percentage": 5,
    "minimum_samples": 100,
    "duration_hours": 1,
    "success_criteria": {
      "max_error_rate_increase": 0.1,
      "max_latency_increase_factor": 2.0
    }
  }
}

Traffic Splitting

Traffic is split deterministically by session hash:

import hashlib
 
def should_apply_fix(session_id: str, percentage: int) -> bool:
    # MD5 for deterministic bucketing across restarts
    bucket = int(hashlib.md5(session_id.encode()).hexdigest()[:8], 16) % 100
    return bucket < percentage

Benefits:

Same session always gets same treatment
No session inconsistency
Reproducible debugging

Success Criteria

Error Rate

Baseline error rate: 10%
Canary error rate: 11%
Increase: 1% (10% relative)

Pass: Yes (under 10% relative increase)

Latency

Baseline P99: 500ms
Canary P99: 600ms
Increase factor: 1.2x

Pass: Yes (under 2x baseline)

Minimum Samples

Wait for statistical confidence:

Current samples: 85
Minimum required: 100

Status: Collecting (85%)

Canary Dashboard

View canary deployments:

Metric	Baseline	Canary	Status
Error Rate	10.2%	9.8%	Good
P50 Latency	234ms	245ms	Good
P99 Latency	890ms	920ms	Good
Samples	1000	52	Collecting

Canary Events

Timeline of events:

10:00 - Canary started (5% traffic)
10:15 - 50 samples collected
10:30 - 100 samples collected
10:31 - Criteria checked: PASS
10:31 - Promoted to ramp stage (25%)

Failure Handling

If canary fails:

Automatic rollback to 0% traffic
Alert sent to team
Analysis of failure metrics
Log failure reason

10:00 - Canary started (5% traffic)
10:15 - 50 samples collected
10:30 - 100 samples collected
10:31 - Criteria checked: FAIL
        Error rate: 15% (50% increase)
10:31 - Rolled back to 0%
10:31 - Alert sent

Auto-Rollback Thresholds

Condition	Threshold
Error rate increase	>10% relative to baseline
P99 latency increase	>2x baseline

Manual Canary

Start a deployment via the API:

curl -X POST "https://app.risicare.ai/api/v1/deployments" \
  -H "Authorization: Bearer rsk-..." \
  -d '{"fix_id": "fix-xyz789"}'

Next Steps

A/B Testing

After canary passes

Learn more

Rollback

If canary fails

Learn more

Edit this page on GitHub

PreviousDeploy NextA/B Testing