Change Data Capture (CDC)
debt(d7/e7/b7/t5)
Closest to 'only careful code review or runtime testing' (d7). The detection_hints.tools list is empty, so no automated tooling is specified. Misconfigurations (wrong WAL level, missing binlog format, consumer lag) are silent until runtime — the application continues operating normally while downstream systems silently receive stale or missing data. This is not caught by compilers, linters, or standard SAST tools, and typically surfaces only through monitoring or user-reported inconsistencies.
Closest to 'cross-cutting refactor across the codebase' (e7). The quick_fix suggests falling back to the outbox pattern for simpler cases, but that itself requires schema changes and transaction-level coordination. Full CDC infrastructure (Debezium, Kafka, WAL configuration, schema evolution handling) spans multiple systems — database config, broker setup, consumer services, and schema migration coordination. Remediating common mistakes like consumer lag or schema breakage requires coordinated changes across multiple components and teams.
Closest to 'strong gravitational pull' (b7). CDC is an architectural pattern that, once adopted, shapes how all downstream data flows are designed — every schema migration must coordinate with CDC consumers, every new service must decide whether to consume the CDC stream, and operational concerns (consumer lag, replication slots, log retention) become persistent productivity taxes across the whole system. The tags (kafka, streaming, event-driven, replication) confirm cross-cutting architectural reach.
Closest to 'notable trap (a documented gotcha most devs eventually learn)' (t5). The canonical misconception is that CDC requires application or schema changes — in fact it reads the transaction log transparently. While this is a genuine surprise, it is a positive one (less invasive than expected). The more dangerous traps are the common_mistakes: needing specific WAL/binlog configuration before deploying, schema changes breaking consumers, and consumer lag causing silent staleness — these are well-documented but non-obvious, placing this at t5.
Also Known As
TL;DR
Explanation
CDC reads the database's write-ahead log (WAL in PostgreSQL, binlog in MySQL) to capture every change as a structured event without adding triggers or polling queries. Tools like Debezium consume these logs and publish change events to Kafka or other message queues. Downstream consumers — search indexes, caches, analytics, microservices — receive changes in near real-time. CDC solves the 'dual write' problem: instead of writing to both a database and a message queue transactionally (hard), you write only to the database and let CDC propagate changes. This guarantees consistency between the database and downstream systems.
Common Misconception
Why It Matters
Common Mistakes
- Polling for changes with 'SELECT * WHERE updated_at > last_check' — misses deletes, requires an index on updated_at, and has a race window for concurrent updates.
- Not configuring WAL level correctly for CDC — PostgreSQL requires wal_level=logical; MySQL requires binlog_format=ROW; check before deploying Debezium.
- Ignoring consumer lag — CDC events queue up in Kafka; if consumers fall behind, downstream systems see stale data for hours.
- Not handling schema changes — adding or renaming a column breaks CDC consumers; coordinate schema migrations with consumer updates.
Code Examples
<?php
// ❌ Dual write — database update + cache/search update without atomicity
public function updateProduct(int $id, array $data): void
{
$this->db->update('products', $data, ['id' => $id]);
$this->redis->del("product:$id"); // What if this fails?
$this->elasticsearch->index($data); // Database updated, search not
$this->eventBus->publish('product.updated', $data); // Partial state
// Any failure here leaves systems inconsistent
}
<?php
// ✅ Outbox pattern — write event to DB in same transaction
public function updateProduct(int $id, array $data): void
{
$this->db->beginTransaction();
try {
$this->db->update('products', $data, ['id' => $id]);
// Event stored atomically with the business data change
$this->db->insert('outbox_events', [
'type' => 'product.updated',
'payload' => json_encode(['id' => $id, ...$data]),
'created_at' => date('Y-m-d H:i:s'),
]);
$this->db->commit();
} catch (Throwable $e) {
$this->db->rollBack();
throw $e;
}
// Separate worker reads outbox and publishes events
// CDC tool (Debezium) reads the outbox table change via WAL
}