Infrastructure Drift
debt(d7/e5/b7/t5)
Closest to 'only careful code review or runtime testing' (d7). The term's detection_hints list terraform, aws-config, ansible, checkov, and driftctl — these are specialist tools that must be deliberately configured and scheduled; drift is not caught by default linters or compilers. Critically, the common_mistakes note that not running terraform plan in CI means drift is discovered during apply, and not alerting on ArgoCD sync failures means drift happens silently. Without proactive setup of these tools, drift accumulates invisibly in production — slightly better than d9 only because the tooling exists and can be automated.
Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix suggests running terraform plan or AWS Config rules on a schedule, but that only detects drift — remediating it requires auditing which manual changes were made, deciding whether to encode them in IaC or revert them, updating Terraform state, playbooks, or other IaC files across potentially multiple modules or stacks, and re-validating environments. A single console change (per the misconception) might be e3, but the common_mistakes describe patterns (shared state corruption, undocumented incident fixes) that compound into multi-file reconciliation work.
Closest to 'strong gravitational pull' (b7). Infrastructure drift applies to web and cli contexts with IaC/GitOps tags, meaning it affects the entire infrastructure lifecycle. Every deployment, incident response, and environment promotion is shaped by whether drift exists. The common_mistakes show it corrupts shared terraform state, breaks CI/CD pipelines, and creates staging-vs-prod mysteries — a persistent, cross-cutting tax. It doesn't quite reach b9 (it doesn't redefine the system architecture) but it heavily shapes how every change must be made.
Closest to 'notable trap (a documented gotcha most devs eventually learn)' (t5). The misconception field explicitly states the trap: developers believe drift is only a problem for large systems, but a single manual console change on a small setup breaks the next terraform apply. This is a well-documented gotcha that many practitioners learn the hard way after an incident. It doesn't rise to t7/t9 because it doesn't contradict a concept from a different ecosystem — it's a domain-specific misunderstanding about scale and severity.
Also Known As
TL;DR
Explanation
Drift occurs when someone makes a change directly in the AWS console, runs a manual kubectl command, or modifies a server config by hand — bypassing the IaC codebase. The declared state (Terraform, CloudFormation) is now different from the actual state. terraform plan detects drift by comparing state file against real infrastructure. GitOps tools (ArgoCD) continuously reconcile and can auto-correct drift. Prevention: strict access controls (only CI/CD can deploy), immutable infrastructure (replace don't modify), and regular drift detection runs.
Common Misconception
Why It Matters
Common Mistakes
- Manual console changes during incidents without updating IaC — the fix works but drift is never documented.
- Not running terraform plan in CI before apply — drift is discovered during apply, causing unexpected changes.
- Shared terraform state without locking — concurrent applies corrupt state.
- Not alerting on ArgoCD sync failures — drift happens silently without monitoring.
Code Examples
# Manual change during incident — creates drift:
# 3am incident: DB connections exhausted
# Engineer manually increases RDS max_connections in AWS console
# Incident resolved. Change never added to Terraform.
# Next terraform apply: reverts max_connections to old value
# Incident recurs. No one knows why — drift was the cause.
# Detect drift regularly:
# In CI (nightly):
terraform plan -detailed-exitcode
# Exit code 2 = changes detected (drift)
# Alert on-call if drift found
# Prevent manual changes:
# IAM policy: deny console changes to production
# Only CI/CD role can apply Terraform
# ArgoCD auto-sync corrects K8s drift immediately