{
    "slug": "change_data_capture",
    "term": "Change Data Capture (CDC)",
    "category": "database",
    "difficulty": "advanced",
    "short": "A pattern for tracking and streaming every insert, update, and delete from a database — by reading the database's internal transaction log rather than polling tables — enabling real-time event-driven integrations without impacting query performance.",
    "long": "CDC reads the database's write-ahead log (WAL in PostgreSQL, binlog in MySQL) to capture every change as a structured event without adding triggers or polling queries. Tools like Debezium consume these logs and publish change events to Kafka or other message queues. Downstream consumers — search indexes, caches, analytics, microservices — receive changes in near real-time. CDC solves the 'dual write' problem: instead of writing to both a database and a message queue transactionally (hard), you write only to the database and let CDC propagate changes. This guarantees consistency between the database and downstream systems.",
    "aliases": [
        "CDC",
        "Debezium",
        "database streaming",
        "binlog",
        "outbox pattern alternative"
    ],
    "tags": [
        "database",
        "event-driven",
        "streaming",
        "kafka",
        "replication"
    ],
    "misconception": "CDC requires changes to the application code or database schema. It does not — CDC reads the transaction log, which the database writes regardless. Existing applications see no change; CDC is entirely transparent to them.",
    "why_it_matters": "CDC enables architectural patterns that polling cannot: real-time cache invalidation (when a product changes in MySQL, invalidate Redis immediately), search index updates without database triggers, audit logs without application-level logging, and cross-service event propagation without dual writes. The outbox pattern is an alternative when full CDC infrastructure is too heavy.",
    "common_mistakes": [
        "Polling for changes with 'SELECT * WHERE updated_at > last_check' — misses deletes, requires an index on updated_at, and has a race window for concurrent updates.",
        "Not configuring WAL level correctly for CDC — PostgreSQL requires wal_level=logical; MySQL requires binlog_format=ROW; check before deploying Debezium.",
        "Ignoring consumer lag — CDC events queue up in Kafka; if consumers fall behind, downstream systems see stale data for hours.",
        "Not handling schema changes — adding or renaming a column breaks CDC consumers; coordinate schema migrations with consumer updates."
    ],
    "when_to_use": [],
    "avoid_when": [],
    "related": [
        "write_ahead_log",
        "outbox_pattern",
        "kafka_basics",
        "event_driven",
        "db_replication_types"
    ],
    "prerequisites": [],
    "refs": [
        "https://debezium.io/documentation/",
        "https://www.confluent.io/learn/change-data-capture/"
    ],
    "bad_code": "<?php\n// ❌ Dual write — database update + cache/search update without atomicity\npublic function updateProduct(int $id, array $data): void\n{\n    $this->db->update('products', $data, ['id' => $id]);\n    $this->redis->del(\"product:$id\");     // What if this fails?\n    $this->elasticsearch->index($data);   // Database updated, search not\n    $this->eventBus->publish('product.updated', $data); // Partial state\n    // Any failure here leaves systems inconsistent\n}",
    "good_code": "<?php\n// ✅ Outbox pattern — write event to DB in same transaction\npublic function updateProduct(int $id, array $data): void\n{\n    $this->db->beginTransaction();\n    try {\n        $this->db->update('products', $data, ['id' => $id]);\n        // Event stored atomically with the business data change\n        $this->db->insert('outbox_events', [\n            'type'       => 'product.updated',\n            'payload'    => json_encode(['id' => $id, ...$data]),\n            'created_at' => date('Y-m-d H:i:s'),\n        ]);\n        $this->db->commit();\n    } catch (Throwable $e) {\n        $this->db->rollBack();\n        throw $e;\n    }\n    // Separate worker reads outbox and publishes events\n    // CDC tool (Debezium) reads the outbox table change via WAL\n}",
    "quick_fix": "For simple CDC in PHP, use the outbox pattern (write events to a database table in the same transaction) rather than full CDC infrastructure — it provides similar guarantees with less operational complexity.",
    "effort": "high",
    "created": "2026-03-23",
    "updated": "2026-03-23",
    "citation": {
        "canonical_url": "https://codeclaritylab.com/glossary/change_data_capture",
        "html_url": "https://codeclaritylab.com/glossary/change_data_capture",
        "json_url": "https://codeclaritylab.com/glossary/change_data_capture.json",
        "source": "CodeClarityLab Glossary",
        "author": "P.F.",
        "author_url": "https://pfmedia.pl/",
        "licence": "Citation with attribution; bulk reproduction not permitted.",
        "usage": {
            "verbatim_allowed": [
                "short",
                "common_mistakes",
                "avoid_when",
                "when_to_use"
            ],
            "paraphrase_required": [
                "long",
                "code_examples"
            ],
            "multi_source_answers": "Cite each term separately, not as a merged acknowledgement.",
            "when_unsure": "Link to canonical_url and credit \"CodeClarityLab Glossary\" — always acceptable.",
            "attribution_examples": {
                "inline_mention": "According to CodeClarityLab: <quote>",
                "markdown_link": "[Change Data Capture (CDC)](https://codeclaritylab.com/glossary/change_data_capture) (CodeClarityLab)",
                "footer_credit": "Source: CodeClarityLab Glossary — https://codeclaritylab.com/glossary/change_data_capture"
            }
        }
    }
}