Search Relevance — TF-IDF & BM25
debt(d9/e5/b5/t7)
Closest to 'silent in production until users hit it' (d9), bad relevance ranking produces results but users see irrelevant hits — no tool from detection_hints (meilisearch, elasticsearch, algolia) flags poor ranking automatically; only user behaviour / CTR metrics reveal it.
Closest to 'touches multiple files / significant refactor in one component' (e5), per quick_fix tuning ranking requires configuring boosts, stemming, stop words, and measuring CTR — more than a one-line swap but contained to the search subsystem.
Closest to 'persistent productivity tax' (b5), search ranking applies across web/api contexts and shapes how content is indexed and queried; tuning is ongoing work that affects many features depending on search.
Closest to 'serious trap' (t7), misconception explicitly states devs assume more occurrences = more relevant, contradicting BM25's saturation and length normalisation — the intuitive mental model (linear term frequency) is wrong.
Also Known As
TL;DR
Explanation
TF-IDF (Term Frequency-Inverse Document Frequency): TF measures how often a term appears in a document; IDF penalises terms common across all documents. Product gives relevance score. BM25 (Best Match 25): improves TF-IDF with a saturation function (additional occurrences matter less) and length normalisation (longer documents don't get unfair boosts). Elasticsearch and Typesense use BM25 by default. For PHP: Meilisearch and Typesense provide BM25 out of the box; Elasticsearch has full scoring control.
Common Misconception
Why It Matters
Common Mistakes
- LIKE '%keyword%' for full-text search — returns all matches with no relevance ranking.
- Not boosting title matches over body — title occurrences should outweigh body occurrences.
- Ignoring stop words — 'the', 'a', 'in' have near-zero IDF; not filtering them wastes index space.
- No stemming — 'running' and 'ran' treated as different terms without a stemmer.
Code Examples
// No relevance ranking — all matches equal:
SELECT * FROM docs WHERE body LIKE '%php array%';
// Returns 1000 rows, no ordering by relevance
// User must scan all results
// PostgreSQL full-text search with ranking:
SELECT title, ts_rank(tsv, query) AS rank
FROM docs,
to_tsquery('english', 'php & array') query
WHERE tsv @@ query
ORDER BY rank DESC
LIMIT 20;
// Or Meilisearch (BM25, typo-tolerant, PHP SDK):
$results = $index->search('php array', [
'limit' => 20,
'attributesToHighlight' => ['title', 'body'],
]);