← Back to glossary

Inverted Index

Search Intermediate

debt(d9/e5/b7/t9)

d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9). No detection_hints tools are listed. The misuse — using LIKE '%term%' instead of a full-text index — is syntactically valid SQL that executes without error or warning. It only reveals itself as a problem when table size grows and users experience slow search; there is no compiler, linter, or standard SAST tool that flags this pattern in typical PHP workflows.

e5 Effort Remediation debt — work required to fix once spotted

Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix describes adding a FULLTEXT INDEX or migrating to PostgreSQL tsvector/Elasticsearch. This isn't a one-line swap: it requires schema migration, application query rewrites, and potentially introducing and operating a new service (Elasticsearch). It also requires re-indexing existing data and updating bulk-insert workflows to maintain the index — touching multiple layers of the application.

b7 Burden Structural debt — long-term weight of choosing wrong

Closest to 'strong gravitational pull' (b7). The choice of search strategy — LIKE queries vs. full-text index vs. Elasticsearch — shapes how data is stored, how inserts are handled, how synonym and stop-word configuration is managed, and how relevance ranking is implemented. As noted in why_it_matters, every search feature in the application is affected. Changing strategy later requires schema changes, data re-indexing, and query rewrites across all search code paths.

t9 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'catastrophic trap — the obvious way is always wrong' (t9). The misconception field states this directly: LIKE '%keyword%' appears to be a reasonable search solution and is the intuitive first approach for most developers, yet it always performs a full table scan regardless of any other indexes on the table. The 'obvious' implementation is never correct for any meaningful data size, making this a canonical t9 trap.

About DEBT scoring → scored by claude-sonnet-4-6 · 2026-05-08 · reviewed by human

Also Known As

inverted file posting list search index full-text index

TL;DR

A data structure that maps each unique word to the list of documents containing it — the foundation of all full-text search engines, enabling fast lookup of documents matching a query term.

Explanation

An inverted index is the inverse of a document store: instead of document → words, it stores word → [document IDs]. When a query arrives, the search engine looks up each query term in the index, retrieves the document lists, and intersects them to find documents containing all terms. Building the index requires tokenising and normalising text (lowercasing, stemming, stop-word removal), then recording each term's positions within each document. The positional information enables phrase search and proximity ranking. Inverted indexes are stored on disk in sorted order, compressed, and split into segments that are merged periodically — this is the fundamental architecture of Lucene (which powers Elasticsearch and Solr), and of MySQL and PostgreSQL full-text search. Understanding this structure explains why full-text search requires a dedicated index column and why adding new documents requires index rebuilding.

Common Misconception

✗ LIKE queries are a reasonable alternative to a proper full-text index for search. LIKE '%keyword%' performs a full table scan — every row is read regardless of index. On a table with 100,000 rows, a LIKE query reads all 100,000 rows every time. An inverted index looks up the keyword in O(1) and returns only matching document IDs. The performance difference is orders of magnitude; LIKE is not a search strategy, it is a filter.

Why It Matters

Every search feature in a PHP application that performs more than simple equality lookups needs an inverted index somewhere — either in MySQL/PostgreSQL full-text indexes, a dedicated search engine like Elasticsearch, or a lightweight solution like SQLite FTS5. Understanding that the index must be built and maintained separately from the document data explains why search results go stale after inserts without re-indexing, why synonym handling requires index-time configuration, and why relevance ranking depends on term frequency statistics computed at index-build time rather than query time.

Common Mistakes

Using LIKE '%term%' for search on large tables — always performs a full table scan, no index is used.
Forgetting to rebuild or update the full-text index after bulk inserts — new documents are invisible to search until indexed.
Expecting the same index to handle both exact-match lookups and full-text search — they use different data structures and are optimised for different query patterns.
Not configuring stop words — common words like 'the', 'is', 'at' inflate the index and reduce ranking accuracy if included.

Code Examples

✗ Vulnerable

// Full table scan — unusable at scale
$results = $pdo->query(
    "SELECT * FROM articles WHERE body LIKE '%{$term}%'"
)->fetchAll();
// Reads every row, no index, SQL injection risk

✓ Fixed

// MySQL FULLTEXT index — uses inverted index
// Schema: ALTER TABLE articles ADD FULLTEXT(title, body);

$stmt = $pdo->prepare(
    'SELECT *, MATCH(title, body) AGAINST(:q IN BOOLEAN MODE) AS score
     FROM articles
     WHERE MATCH(title, body) AGAINST(:q2 IN BOOLEAN MODE)
     ORDER BY score DESC
     LIMIT 20'
);
$stmt->execute([':q' => $term, ':q2' => $term]);
$results = $stmt->fetchAll();

References

https://dev.mysql.com/doc/refman/8.0/en/fulltext-search.html