Database Normalisation
debt(d7/e7/b7/t5)
Closest to 'only careful code review or runtime testing' (d7). While mysql-workbench, pgadmin, and db-schema-linter are listed as tools, detection_hints.automated is 'no' — identifying normalisation violations like comma-separated values, partial dependencies, or transitive dependencies requires manual schema review or runtime observation of data anomalies. No automated tool reliably catches these patterns across all cases.
Closest to 'cross-cutting refactor across the codebase' (e7). Fixing normalisation issues requires restructuring database tables, creating new tables with foreign keys, migrating existing data, and updating all application queries that touch affected tables. The quick_fix describes conceptual steps (eliminate repeating groups, remove partial/transitive dependencies), but implementing these changes spans multiple files and often requires coordinated deployment with data migrations.
Closest to 'strong gravitational pull' (b7). Database schema design shapes every query, every ORM mapping, and every data access pattern in the application. Applies_to indicates web and cli contexts — essentially all data-touching code. A normalisation decision made early becomes load-bearing; changing it later requires reworking models, queries, and potentially business logic across the codebase.
Closest to 'notable trap' (t5). The misconception explicitly states 'Higher normal forms are always better' — a documented gotcha that most devs eventually learn. Experienced developers know to normalise OLTP tables but denormalise for reporting. The common_mistakes list confirms this: over-normalising analytics tables and conflating normalisation with performance are well-known pitfalls, but they're learnable rather than deeply counterintuitive.
Also Known As
TL;DR
Explanation
Normalisation eliminates update anomalies caused by duplicated data. First Normal Form (1NF): atomic values, no repeating groups. Second Normal Form (2NF): no partial dependencies on composite keys. Third Normal Form (3NF): no transitive dependencies. Boyce-Codd Normal Form (BCNF): every determinant is a candidate key. In practice, 3NF satisfies most applications. Denormalisation — intentional violation for read performance — is acceptable when backed by measured need.
Common Misconception
Why It Matters
Common Mistakes
- Storing comma-separated values in a single column — violates 1NF and makes querying impossible without string parsing.
- Repeating customer name and address in every order row instead of a foreign key to a customers table.
- Normalising reporting/analytics tables that need fast aggregations — calculated columns or materialised views are better there.
- Conflating normalisation with performance — always normalise first, then denormalise specific tables with measurement.
Code Examples
-- Unnormalised: customer data repeated in every order row
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_name VARCHAR(100), -- Repeated for every order
customer_email VARCHAR(100), -- Update in one row, others stale
customer_address TEXT,
product VARCHAR(100),
quantity INT
);
-- 3NF: customer data stored once, referenced by FK
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100) UNIQUE,
address TEXT
);
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT REFERENCES customers(id),
product_id INT REFERENCES products(id),
quantity INT
);