Blue/Green Deployment
debt(d7/e7/b5/t7)
Closest to 'only careful code review or runtime testing' (d7). The detection_hints indicate automated detection is 'no', and the code_pattern describes symptoms like downtime, missing rollback capability, and absent smoke tests — none of which surface until deployment time or post-incident review. Tools listed (aws-codedeploy, kubernetes, capistrano) help implement the pattern but do not detect misuse of it. Misconfigurations like un-warmed environments or misrouted background workers only manifest under production load.
Closest to 'cross-cutting refactor across the codebase' (e7). The quick_fix describes maintaining two full production environments, updating load balancer configuration, establishing smoke test pipelines, and coordinating database migration strategies. Common mistakes reveal that background workers, caches, and migrations all need independent remediation. Fixing an improperly implemented blue-green deployment touches infrastructure, deployment scripts, CI/CD pipelines, and database migration procedures — a cross-cutting concern well beyond a single-component fix.
Closest to 'persistent productivity tax' (b5). The applies_to scope covers web and API contexts broadly, and maintaining two identical production environments imposes ongoing operational overhead on every deployment cycle. Teams must coordinate environment warm-up, migration timing, and worker routing on every release. It doesn't fully reshape the system's architecture (b7), but it does create a persistent tax on multiple workflows including deployments, migrations, and monitoring.
Closest to 'serious trap' (t7). The misconception field directly states the canonical wrong belief: developers assume blue-green eliminates all deployment risk, but data migrations in the green environment affect production data immediately and cannot be rolled back by a traffic switch alone. This directly contradicts the widely-held mental model that 'instant rollback' means full safety — a competent developer familiar with the zero-downtime pitch will naturally assume rollback undoes all effects, which is wrong in the data layer.
Also Known As
TL;DR
Explanation
Blue/Green deployment keeps two production-identical environments. The live environment (say, blue) serves all traffic while the new release is deployed to the idle environment (green) and tested. Traffic is switched (via load balancer or DNS) from blue to green instantaneously. If issues arise, switching back to blue provides immediate rollback. The main costs are double infrastructure and the challenge of shared state (database migrations must be backward compatible with both versions during switchover). Blue/Green is distinct from canary releases, which gradually shift a percentage of traffic.
Diagram
flowchart LR
LB[Load Balancer] -->|100% traffic| BLUE[Blue<br/>v1.2 LIVE]
LB -.->|0% traffic| GREEN[Green<br/>v1.3 IDLE]
subgraph After Deploy
LB2[Load Balancer] -->|100% traffic| GREEN2[Green<br/>v1.3 LIVE]
LB2 -.->|0% fallback| BLUE2[Blue<br/>v1.2 STANDBY]
end
style BLUE fill:#1f6feb,color:#fff
style GREEN fill:#238636,color:#fff
Watch Out
Common Misconception
Why It Matters
Common Mistakes
- Running database migrations before switching traffic — if the migration breaks, rollback is now impossible without data loss.
- Not keeping both environments warm — a cold "green" environment has cache misses and slower initial responses.
- Forgetting to update background workers when switching — jobs may still be routed to the old environment.
- Treating blue-green as a substitute for testing — it reduces recovery time, it does not replace pre-deployment validation.
Avoid When
- Avoid blue-green when your database schema changes are not backward-compatible — both environments share state and the old app may break on a new schema.
- Do not use it for services where running two versions simultaneously causes correctness issues (e.g. exclusive queue consumers).
When To Use
- Use blue-green when you need instant rollback — switching traffic back is a single load-balancer change, not a re-deploy.
- Apply it when the cost of downtime exceeds the cost of running two identical environments simultaneously.
- Combine with smoke tests against the green environment before flipping traffic — the idle environment is a free staging slot.
Code Examples
// Single environment deploy — downtime during swap:
// 1. Stop app server
// 2. Deploy new code
// 3. Restart app server <- downtime window here
// 4. Test (on live traffic)
// Blue-green: no downtime:
// 1. Deploy to green (idle environment)
// 2. Test green
// 3. Switch load balancer from blue to green — instant, zero downtime
// 4. Keep blue as instant rollback
# Blue = current live, Green = new version
# 1. Deploy new version to Green environment (zero traffic)
nginx -s reload # after updating upstream to green
# 2. Smoke test Green
curl https://green.internal/health
# 3. Switch load balancer to Green
# nginx upstream config:
upstream app {
# server blue-app:9000; ← comment out
server green-app:9000; # ← activate
}
# 4. Monitor — rollback in seconds by reversing the upstream swap
# 5. Decommission Blue after confidence period