Write-Ahead Log (WAL)
debt(d9/e7/b7/t7)
Closest to 'silent in production until users hit it' (d9). The most dangerous misuse — fsync=off in Docker-based PostgreSQL setups — produces no warnings or errors during normal operation. Database corruption only manifests after a crash or OOM kill, often in production under load. No tools are listed in detection_hints, and there is no compile-time or linter signal for misconfigured WAL settings.
Closest to 'cross-cutting refactor across the codebase' (e7). While the quick_fix for fsync or synchronous_commit is a single config line, the broader remediation picture is architectural: not archiving WAL requires setting up archiving infrastructure and re-establishing recovery baselines; unconsumed logical replication slots causing disk fill require auditing all consumers; replication lag from WAL volume requires infrastructure-level changes to replicas, network, or write patterns. These span infrastructure, ops runbooks, application write patterns, and monitoring — well beyond a single-file fix.
Closest to 'strong gravitational pull' (b7). WAL configuration choices (synchronous replication vs async, fsync, archiving strategy, logical replication slots) shape the entire database durability and replication architecture. Every decision about backup strategy, HA topology, CDC pipelines, and recovery objectives is influenced by how WAL is configured. While not quite a full system rewrite to change, it exerts strong gravity over every future change involving persistence or replication.
Closest to 'serious trap — contradicts how a similar concept works elsewhere' (t7). The misconception field states developers believe WAL is PostgreSQL-specific, when it is universal. More dangerously, the common_mistakes show that fsync=off is silently the default in many Docker PostgreSQL images — a setting that appears safe (and may even be documented as a performance tip) but causes data corruption on any unclean shutdown. This directly contradicts the reasonable assumption that default Docker images are safe for development use, and the 'obvious' performance optimization is actually catastrophic.
Also Known As
TL;DR
Explanation
The Write-Ahead Log (WAL) is the primary mechanism for crash recovery and replication in databases. Before modifying any data pages, the database writes a log record describing the change to a sequential append-only file. On crash recovery, the database reads the WAL from the last checkpoint forward and re-applies all committed transactions. Uncommitted transactions are rolled back. This 'write log first, then data' rule (the WAL protocol) ensures durability — once a transaction is acknowledged, its log record is on disk even if the data pages are still in memory. PostgreSQL exposes WAL directly: pg_wal/ contains the log files; WAL streaming is the mechanism for replication; logical decoding reads WAL for change data capture.
Common Misconception
Why It Matters
Common Mistakes
- Setting fsync=off in production for performance — this is the default for many Docker-based PostgreSQL setups; it causes database corruption on power loss or OOM kills.
- Not archiving WAL for point-in-time recovery — without WAL archiving, you can only restore to your last base backup, not to any moment in between.
- Underestimating WAL volume during high write periods — replication lag grows when WAL is generated faster than replicas can apply it; monitor pg_replication_slots.
- Using logical replication slots without monitoring — unconsumed logical replication slots cause WAL to accumulate indefinitely, filling disk.
Avoid When
- Do not disable WAL/redo logging in production databases — it is the primary durability guarantee.
- Avoid extremely large WAL files by tuning checkpoint frequency — unbounded WAL growth increases crash recovery time.
When To Use
- WAL is the mechanism behind database crash recovery and replication — understanding it helps debug replication lag and disk I/O spikes.
- Enable WAL mode in SQLite for better concurrent read performance.
Code Examples
# ❌ PostgreSQL settings that sacrifice durability for speed
# Dangerous in production — data loss on crash
fsync = off # Disables OS fsync — data NOT guaranteed on disk
synchronous_commit = off # Commits return before WAL written to disk
full_page_writes = off # Risks torn page corruption after crash
# ✅ PostgreSQL WAL settings — durability vs performance
# Production defaults (safe):
fsync = on # Ensure WAL is on disk before ack
synchronous_commit = on # Default — safest
full_page_writes = on # Protects against torn page writes
wal_level = replica # Required for streaming replication
archive_mode = on # Enable WAL archiving for PITR
archive_command = 'cp %p /mnt/wal-archive/%f'
# For read replicas:
# primary_conninfo in recovery.conf
# Replica streams WAL from primary and applies it continuously