Database Normalisation
debt(d7/e7/b7/t5)
Closest to 'only careful code review or runtime testing' (d7). The detection_hints indicate automated detection is 'no' — tools like mysql-workbench and pgadmin can visualize schema but don't automatically flag normalisation violations. Identifying duplicate data across tables, missing FK relationships, or JSON columns storing related values requires manual schema review or data quality audits. No compiler or linter catches this.
Closest to 'cross-cutting refactor across the codebase' (e7). The quick_fix says 'normalise to 3NF' but achieving this after denormalised data exists requires schema migrations, splitting tables, establishing FK relationships, and updating all application code that reads/writes the affected tables. Common_mistakes like 'repeating customer name in every order row' means fixing requires data migration across potentially millions of rows plus application-wide query changes.
Closest to 'strong gravitational pull' (b7). Database schema normalisation is a foundational architectural decision that shapes every query, every ORM model, and every data access pattern. The applies_to shows it affects all contexts (web, cli). Once you have denormalised data spread across the system, every new feature must work around or perpetuate the inconsistency. It's not quite b9 (rewrite-required) but close — you can incrementally normalise with effort.
Closest to 'notable trap' (t5). The misconception field explicitly states developers think normalisation is 'always about database normal forms' when it also means canonical representation (E.164 phones, lowercase emails, UTC dates). Common_mistakes include over-normalising causing JOIN-heavy queries — the trap is both under-normalising (data anomalies) and over-normalising (performance problems). Most experienced devs eventually learn the balance, but it's a documented gotcha.
Also Known As
TL;DR
Explanation
Normalisation progressively eliminates redundancy: 1NF (atomic column values — no repeating groups or arrays in cells), 2NF (eliminate partial dependencies — every non-key column depends on the whole primary key, not just part of it), 3NF (eliminate transitive dependencies — non-key columns depend only on the primary key, not on other non-key columns). BCNF and 4NF handle edge cases. Over-normalisation leads to excessive JOINs degrading read performance — denormalise deliberately when queries justify it, documented as an explicit trade-off. For PHP applications using ORMs, normalisation affects relationship mapping (hasMany, belongsToMany) and eager-loading strategies.
Common Misconception
Why It Matters
Common Mistakes
- Storing calculated values that can be derived from existing columns — they go out of sync.
- Repeating customer name and address in every order row — a customer name change requires updating thousands of rows.
- Over-normalising to the point where simple queries require many JOINs — pragmatic denormalisation is sometimes correct.
- Not understanding 1NF, 2NF, 3NF — normalising to 3NF solves most practical anomalies.
Code Examples
-- 1NF violation — repeating groups in one column
CREATE TABLE orders (
id INT,
tags VARCHAR(255) -- 'urgent,fragile,priority' -- comma-list is not atomic
);
-- 3NF design: separate tables, no transitive dependencies
CREATE TABLE orders (id INT PRIMARY KEY, customer_id INT, total DECIMAL);
CREATE TABLE customers (id INT PRIMARY KEY, name VARCHAR(100), city_id INT);
CREATE TABLE cities (id INT PRIMARY KEY, name VARCHAR(100), country_id INT);
-- Tags as many-to-many (junction table)
CREATE TABLE tags (id INT PRIMARY KEY, name VARCHAR(50));
CREATE TABLE order_tags (order_id INT, tag_id INT, PRIMARY KEY (order_id, tag_id));
-- Denormalise only with evidence: materialised views or computed columns
-- for read-heavy aggregates, not as the default