{
    "slug": "cloud_backup_disaster_recovery",
    "term": "Cloud Backup & Disaster Recovery",
    "category": "cloud",
    "difficulty": "intermediate",
    "short": "Automated backups with tested restores, defined RPO/RTO targets, and cross-region replication for catastrophic failure recovery.",
    "long": "Backup and disaster recovery (DR) in the cloud means more than enabling automated snapshots — it means defining and testing recovery objectives. RPO (Recovery Point Objective) is the maximum acceptable data loss measured in time; RTO (Recovery Time Objective) is the maximum acceptable downtime. A 5-minute RPO requires continuous replication or frequent snapshots; a 1-hour RTO requires warm standby infrastructure, not cold backups.\n\nDR strategies range in cost and complexity: Backup-and-restore (cheapest, RTO hours-days), Pilot Light (core services running in DR region, scaled up on failover), Warm Standby (scaled-down full stack, scaled up on failover), and Multi-Site Active-Active (full capacity in both regions, instant failover). Choose based on business cost-of-downtime, not engineering preference.\n\nFor PHP applications: RDS automated backups with point-in-time recovery, cross-region read replicas, S3 versioning with cross-region replication, and Infrastructure-as-Code (Terraform/CloudFormation) to recreate environments. Database snapshots alone are insufficient — you also need application code (in version control), container images (in ECR with cross-region replication), secrets (in Secrets Manager with replication), and DNS failover (Route 53 health checks).\n\nThe most common failure: backups that have never been restored. A backup you have not tested is a hope, not a recovery plan. Schedule quarterly DR drills where you actually restore to a separate environment and validate application functionality. Document the runbook step-by-step so any on-call engineer can execute it at 3am under pressure.",
    "aliases": [
        "DR",
        "disaster recovery",
        "RPO RTO",
        "cloud backups"
    ],
    "tags": [
        "cloud",
        "disaster-recovery",
        "backup",
        "resilience",
        "business-continuity"
    ],
    "misconception": "Enabling automated snapshots is sufficient disaster recovery — in reality, untested backups, single-region storage, and missing infrastructure definitions all cause real recoveries to fail.",
    "why_it_matters": "A regional AWS outage or accidental database deletion can destroy a business overnight if backups are untested, in the same region, or missing the application context needed to restore service.",
    "common_mistakes": [
        "Backups stored only in the same region as production",
        "Never testing restore procedures until a real disaster strikes",
        "Backing up the database but not the application config, secrets, or container images",
        "Setting unrealistic RTO/RPO targets without budgeting for warm standby infrastructure",
        "No documented runbook so only one senior engineer knows how to recover"
    ],
    "when_to_use": [],
    "avoid_when": [],
    "related": [
        "cloud_region_selection",
        "cloud_storage_s3",
        "cloud_databases",
        "infrastructure_as_code_tools"
    ],
    "prerequisites": [
        "aws_fundamentals",
        "cloud_region_selection",
        "cloud_storage_s3",
        "infrastructure_as_code_tools"
    ],
    "refs": [
        "https://docs.aws.amazon.com/whitepapers/latest/disaster-recovery-workloads-on-aws/disaster-recovery-workloads-on-aws.html",
        "https://aws.amazon.com/blogs/architecture/disaster-recovery-dr-architecture-on-aws-part-i-strategies-for-recovery-in-the-cloud/",
        "https://cloud.google.com/architecture/dr-scenarios-planning-guide"
    ],
    "bad_code": "# RDS in us-east-1 only\n# Automated backups enabled, retention 7 days\n# No cross-region copy, no restore testing\n# Application secrets only in us-east-1 Secrets Manager\n# Runbook: \"Ask Dave, he set it up\"",
    "good_code": "# RDS with cross-region automated backups\naws rds modify-db-instance --db-instance-identifier prod \\\n  --backup-retention-period 30 \\\n  --apply-immediately\n\n# Cross-region snapshot copy via EventBridge rule\naws rds copy-db-snapshot \\\n  --source-db-snapshot-identifier arn:aws:rds:us-east-1:... \\\n  --target-db-snapshot-identifier prod-dr-snapshot \\\n  --source-region us-east-1 --region us-west-2\n\n# S3 cross-region replication\naws s3api put-bucket-replication --bucket prod-assets \\\n  --replication-configuration file://replication.json\n\n# Quarterly: restore to staging account, run smoke tests\n# Documented runbook in runbooks/dr-failover.md",
    "quick_fix": "Enable cross-region backup replication, define written RPO/RTO targets, and schedule a quarterly restore drill where you actually recover into a fresh environment and verify the app works",
    "severity": "high",
    "effort": "high",
    "created": "2026-05-28",
    "updated": "2026-05-28",
    "citation": {
        "canonical_url": "https://codeclaritylab.com/glossary/cloud_backup_disaster_recovery",
        "html_url": "https://codeclaritylab.com/glossary/cloud_backup_disaster_recovery",
        "json_url": "https://codeclaritylab.com/glossary/cloud_backup_disaster_recovery.json",
        "source": "CodeClarityLab Glossary",
        "author": "P.F.",
        "author_url": "https://pfmedia.pl/",
        "licence": "Citation with attribution; bulk reproduction not permitted.",
        "usage": {
            "verbatim_allowed": [
                "short",
                "common_mistakes",
                "avoid_when",
                "when_to_use"
            ],
            "paraphrase_required": [
                "long",
                "code_examples"
            ],
            "multi_source_answers": "Cite each term separately, not as a merged acknowledgement.",
            "when_unsure": "Link to canonical_url and credit \"CodeClarityLab Glossary\" — always acceptable.",
            "attribution_examples": {
                "inline_mention": "According to CodeClarityLab: <quote>",
                "markdown_link": "[Cloud Backup & Disaster Recovery](https://codeclaritylab.com/glossary/cloud_backup_disaster_recovery) (CodeClarityLab)",
                "footer_credit": "Source: CodeClarityLab Glossary — https://codeclaritylab.com/glossary/cloud_backup_disaster_recovery"
            }
        }
    }
}