← Home ← Codex ← DEBT
Browse by Category
+ added · updated 7d
← Back to glossary

Search Relevance — TF-IDF & BM25

Search Advanced
debt(d9/e5/b5/t7)
d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9), bad relevance ranking produces results but users see irrelevant hits — no tool from detection_hints (meilisearch, elasticsearch, algolia) flags poor ranking automatically; only user behaviour / CTR metrics reveal it.

e5 Effort Remediation debt — work required to fix once spotted

Closest to 'touches multiple files / significant refactor in one component' (e5), per quick_fix tuning ranking requires configuring boosts, stemming, stop words, and measuring CTR — more than a one-line swap but contained to the search subsystem.

b5 Burden Structural debt — long-term weight of choosing wrong

Closest to 'persistent productivity tax' (b5), search ranking applies across web/api contexts and shapes how content is indexed and queried; tuning is ongoing work that affects many features depending on search.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap' (t7), misconception explicitly states devs assume more occurrences = more relevant, contradicting BM25's saturation and length normalisation — the intuitive mental model (linear term frequency) is wrong.

About DEBT scoring →

Also Known As

TF-IDF BM25 search ranking relevance scoring

TL;DR

Ranking algorithms that score documents by how relevant they are to a query — TF-IDF and BM25 balance term frequency against document length to surface the best matches.

Explanation

TF-IDF (Term Frequency-Inverse Document Frequency): TF measures how often a term appears in a document; IDF penalises terms common across all documents. Product gives relevance score. BM25 (Best Match 25): improves TF-IDF with a saturation function (additional occurrences matter less) and length normalisation (longer documents don't get unfair boosts). Elasticsearch and Typesense use BM25 by default. For PHP: Meilisearch and Typesense provide BM25 out of the box; Elasticsearch has full scoring control.

Common Misconception

More occurrences of a search term always means more relevant — BM25's saturation function means the 10th occurrence of a word adds almost no relevance; BM25 prevents keyword stuffing from gaming rankings.

Why It Matters

A PHP documentation search that ranks a 10,000-word tutorial above a precise 200-word reference page just because the tutorial mentions the keyword more is wrong — BM25 length normalisation prevents this.

Common Mistakes

  • LIKE '%keyword%' for full-text search — returns all matches with no relevance ranking.
  • Not boosting title matches over body — title occurrences should outweigh body occurrences.
  • Ignoring stop words — 'the', 'a', 'in' have near-zero IDF; not filtering them wastes index space.
  • No stemming — 'running' and 'ran' treated as different terms without a stemmer.

Code Examples

✗ Vulnerable
// No relevance ranking — all matches equal:
SELECT * FROM docs WHERE body LIKE '%php array%';
// Returns 1000 rows, no ordering by relevance
// User must scan all results
✓ Fixed
// PostgreSQL full-text search with ranking:
SELECT title, ts_rank(tsv, query) AS rank
FROM docs,
     to_tsquery('english', 'php & array') query
WHERE tsv @@ query
ORDER BY rank DESC
LIMIT 20;

// Or Meilisearch (BM25, typo-tolerant, PHP SDK):
$results = $index->search('php array', [
    'limit' => 20,
    'attributesToHighlight' => ['title', 'body'],
]);

Added 16 Mar 2026
Edited 22 Mar 2026
Views 60
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
1 ping T 0 pings W 1 ping T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 2 pings T 0 pings F 1 ping S 3 pings S 5 pings M 1 ping T 0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 1 ping W
SEMrush 1
No pings yesterday
Perplexity 10 Scrapy 9 Amazonbot 8 Google 7 SEMrush 5 Ahrefs 4 ChatGPT 4 Claude 1 Meta AI 1
crawler 46 crawler_json 3
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: High
⚡ Quick Fix
Configure search result ranking explicitly — boost recent items, exact matches, and popular items over partial matches; measure click-through rate to validate your ranking formula
📦 Applies To
any web api
🔗 Prerequisites
🔍 Detection Hints
Default ranking without tuning; outdated content ranking above recent; title matches not boosted over body matches
Auto-detectable: ✗ No meilisearch elasticsearch algolia
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: Medium ✗ Manual fix Fix: High Context: File Tests: Update


✓ schema.org compliant