← CodeClarityLab Home
Browse by Category
+ added · updated 7d
← Back to glossary

Inverted Index

search Intermediate

Also Known As

inverted file posting list search index full-text index

TL;DR

A data structure that maps each unique word to the list of documents containing it — the foundation of all full-text search engines, enabling fast lookup of documents matching a query term.

Explanation

An inverted index is the inverse of a document store: instead of document → words, it stores word → [document IDs]. When a query arrives, the search engine looks up each query term in the index, retrieves the document lists, and intersects them to find documents containing all terms. Building the index requires tokenising and normalising text (lowercasing, stemming, stop-word removal), then recording each term's positions within each document. The positional information enables phrase search and proximity ranking. Inverted indexes are stored on disk in sorted order, compressed, and split into segments that are merged periodically — this is the fundamental architecture of Lucene (which powers Elasticsearch and Solr), and of MySQL and PostgreSQL full-text search. Understanding this structure explains why full-text search requires a dedicated index column and why adding new documents requires index rebuilding.

Common Misconception

LIKE queries are a reasonable alternative to a proper full-text index for search. LIKE '%keyword%' performs a full table scan — every row is read regardless of index. On a table with 100,000 rows, a LIKE query reads all 100,000 rows every time. An inverted index looks up the keyword in O(1) and returns only matching document IDs. The performance difference is orders of magnitude; LIKE is not a search strategy, it is a filter.

Why It Matters

Every search feature in a PHP application that performs more than simple equality lookups needs an inverted index somewhere — either in MySQL/PostgreSQL full-text indexes, a dedicated search engine like Elasticsearch, or a lightweight solution like SQLite FTS5. Understanding that the index must be built and maintained separately from the document data explains why search results go stale after inserts without re-indexing, why synonym handling requires index-time configuration, and why relevance ranking depends on term frequency statistics computed at index-build time rather than query time.

Common Mistakes

  • Using LIKE '%term%' for search on large tables — always performs a full table scan, no index is used.
  • Forgetting to rebuild or update the full-text index after bulk inserts — new documents are invisible to search until indexed.
  • Expecting the same index to handle both exact-match lookups and full-text search — they use different data structures and are optimised for different query patterns.
  • Not configuring stop words — common words like 'the', 'is', 'at' inflate the index and reduce ranking accuracy if included.

Code Examples

✗ Vulnerable
// Full table scan — unusable at scale
$results = $pdo->query(
    "SELECT * FROM articles WHERE body LIKE '%{$term}%'"
)->fetchAll();
// Reads every row, no index, SQL injection risk
✓ Fixed
// MySQL FULLTEXT index — uses inverted index
// Schema: ALTER TABLE articles ADD FULLTEXT(title, body);

$stmt = $pdo->prepare(
    'SELECT *, MATCH(title, body) AGAINST(:q IN BOOLEAN MODE) AS score
     FROM articles
     WHERE MATCH(title, body) AGAINST(:q2 IN BOOLEAN MODE)
     ORDER BY score DESC
     LIMIT 20'
);
$stmt->execute([':q' => $term, ':q2' => $term]);
$results = $stmt->fetchAll();

Added 23 Mar 2026
Views 39
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings W 0 pings T 3 pings F 0 pings S 0 pings S 1 ping M 0 pings T 0 pings W 1 ping T 2 pings F 0 pings S 1 ping S 0 pings M 0 pings T 0 pings W 0 pings T 2 pings F 0 pings S 0 pings S 0 pings M 0 pings T 1 ping W 0 pings T 2 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T
No pings yet today
No pings yesterday
Amazonbot 12 Perplexity 11 Ahrefs 3 Google 2 ChatGPT 1 Meta AI 1 SEMrush 1
crawler 30 crawler_json 1
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: Medium
⚡ Quick Fix
Add FULLTEXT INDEX on searchable columns in MySQL, use to_tsvector/to_tsquery in PostgreSQL, or use Elasticsearch for complex relevance requirements

✓ schema.org compliant