Inverted Index
debt(d9/e5/b7/t9)
Closest to 'silent in production until users hit it' (d9). No detection_hints tools are listed. The misuse — using LIKE '%term%' instead of a full-text index — is syntactically valid SQL that executes without error or warning. It only reveals itself as a problem when table size grows and users experience slow search; there is no compiler, linter, or standard SAST tool that flags this pattern in typical PHP workflows.
Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix describes adding a FULLTEXT INDEX or migrating to PostgreSQL tsvector/Elasticsearch. This isn't a one-line swap: it requires schema migration, application query rewrites, and potentially introducing and operating a new service (Elasticsearch). It also requires re-indexing existing data and updating bulk-insert workflows to maintain the index — touching multiple layers of the application.
Closest to 'strong gravitational pull' (b7). The choice of search strategy — LIKE queries vs. full-text index vs. Elasticsearch — shapes how data is stored, how inserts are handled, how synonym and stop-word configuration is managed, and how relevance ranking is implemented. As noted in why_it_matters, every search feature in the application is affected. Changing strategy later requires schema changes, data re-indexing, and query rewrites across all search code paths.
Closest to 'catastrophic trap — the obvious way is always wrong' (t9). The misconception field states this directly: LIKE '%keyword%' appears to be a reasonable search solution and is the intuitive first approach for most developers, yet it always performs a full table scan regardless of any other indexes on the table. The 'obvious' implementation is never correct for any meaningful data size, making this a canonical t9 trap.
Also Known As
TL;DR
Explanation
An inverted index is the inverse of a document store: instead of document → words, it stores word → [document IDs]. When a query arrives, the search engine looks up each query term in the index, retrieves the document lists, and intersects them to find documents containing all terms. Building the index requires tokenising and normalising text (lowercasing, stemming, stop-word removal), then recording each term's positions within each document. The positional information enables phrase search and proximity ranking. Inverted indexes are stored on disk in sorted order, compressed, and split into segments that are merged periodically — this is the fundamental architecture of Lucene (which powers Elasticsearch and Solr), and of MySQL and PostgreSQL full-text search. Understanding this structure explains why full-text search requires a dedicated index column and why adding new documents requires index rebuilding.
Common Misconception
Why It Matters
Common Mistakes
- Using LIKE '%term%' for search on large tables — always performs a full table scan, no index is used.
- Forgetting to rebuild or update the full-text index after bulk inserts — new documents are invisible to search until indexed.
- Expecting the same index to handle both exact-match lookups and full-text search — they use different data structures and are optimised for different query patterns.
- Not configuring stop words — common words like 'the', 'is', 'at' inflate the index and reduce ranking accuracy if included.
Code Examples
// Full table scan — unusable at scale
$results = $pdo->query(
"SELECT * FROM articles WHERE body LIKE '%{$term}%'"
)->fetchAll();
// Reads every row, no index, SQL injection risk
// MySQL FULLTEXT index — uses inverted index
// Schema: ALTER TABLE articles ADD FULLTEXT(title, body);
$stmt = $pdo->prepare(
'SELECT *, MATCH(title, body) AGAINST(:q IN BOOLEAN MODE) AS score
FROM articles
WHERE MATCH(title, body) AGAINST(:q2 IN BOOLEAN MODE)
ORDER BY score DESC
LIMIT 20'
);
$stmt->execute([':q' => $term, ':q2' => $term]);
$results = $stmt->fetchAll();