← Back to glossary

Search Relevance — TF-IDF & BM25

Search Advanced

debt(d9/e5/b5/t7)

d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9), bad relevance ranking produces results but users see irrelevant hits — no tool from detection_hints (meilisearch, elasticsearch, algolia) flags poor ranking automatically; only user behaviour / CTR metrics reveal it.

e5 Effort Remediation debt — work required to fix once spotted

Closest to 'touches multiple files / significant refactor in one component' (e5), per quick_fix tuning ranking requires configuring boosts, stemming, stop words, and measuring CTR — more than a one-line swap but contained to the search subsystem.

b5 Burden Structural debt — long-term weight of choosing wrong

Closest to 'persistent productivity tax' (b5), search ranking applies across web/api contexts and shapes how content is indexed and queried; tuning is ongoing work that affects many features depending on search.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap' (t7), misconception explicitly states devs assume more occurrences = more relevant, contradicting BM25's saturation and length normalisation — the intuitive mental model (linear term frequency) is wrong.

About DEBT scoring → scored by claude-opus-4-7 · 2026-05-11 · reviewed by human

Also Known As

TF-IDF BM25 search ranking relevance scoring

TL;DR

Ranking algorithms that score documents by how relevant they are to a query — TF-IDF and BM25 balance term frequency against document length to surface the best matches.

Explanation

TF-IDF (Term Frequency-Inverse Document Frequency): TF measures how often a term appears in a document; IDF penalises terms common across all documents. Product gives relevance score. BM25 (Best Match 25): improves TF-IDF with a saturation function (additional occurrences matter less) and length normalisation (longer documents don't get unfair boosts). Elasticsearch and Typesense use BM25 by default. For PHP: Meilisearch and Typesense provide BM25 out of the box; Elasticsearch has full scoring control.

Common Misconception

✗ More occurrences of a search term always means more relevant — BM25's saturation function means the 10th occurrence of a word adds almost no relevance; BM25 prevents keyword stuffing from gaming rankings.

Why It Matters

A PHP documentation search that ranks a 10,000-word tutorial above a precise 200-word reference page just because the tutorial mentions the keyword more is wrong — BM25 length normalisation prevents this.

Common Mistakes

LIKE '%keyword%' for full-text search — returns all matches with no relevance ranking.
Not boosting title matches over body — title occurrences should outweigh body occurrences.
Ignoring stop words — 'the', 'a', 'in' have near-zero IDF; not filtering them wastes index space.
No stemming — 'running' and 'ran' treated as different terms without a stemmer.

Code Examples

✗ Vulnerable

// No relevance ranking — all matches equal:
SELECT * FROM docs WHERE body LIKE '%php array%';
// Returns 1000 rows, no ordering by relevance
// User must scan all results

✓ Fixed

// PostgreSQL full-text search with ranking:
SELECT title, ts_rank(tsv, query) AS rank
FROM docs,
     to_tsquery('english', 'php & array') query
WHERE tsv @@ query
ORDER BY rank DESC
LIMIT 20;

// Or Meilisearch (BM25, typo-tolerant, PHP SDK):
$results = $index->search('php array', [
    'limit' => 20,
    'attributesToHighlight' => ['title', 'body'],
]);

Tags

search algorithms elasticsearch

Added 16 Mar 2026

Edited 22 Mar 2026

Curated in Warsaw under one editorial standard. 1,506 terms, single voice. About this reference →

Rate this term

No ratings yet

🤖 AI Guestbook educational data only

| |

Last 30 days

Agents 1

SEMrush 1

No pings yesterday

Perplexity 10 Scrapy 9 Amazonbot 8 Google 7 SEMrush 5 Ahrefs 4 ChatGPT 4 Claude 1 Meta AI 1

Also referenced

Meilisearch & Typesense 42 String Algorithms 40 Search Indexing Pipeline 35

How they use it

crawler 46 crawler_json 3

Related categories

algorithms 2.1k search 687

⚡ DEV INTEL Tools & Severity

🟡 Medium ⚙ Fix effort: High

⚡ Quick Fix

Configure search result ranking explicitly — boost recent items, exact matches, and popular items over partial matches; measure click-through rate to validate your ranking formula

📦 Applies To

any web api

🔗 Prerequisites

Full-Text Search Meilisearch & Typesense Elasticsearch

🔍 Detection Hints

Default ranking without tuning; outdated content ranking above recent; title matches not boosted over body matches

Auto-detectable: ✗ No meilisearch elasticsearch algolia

⚠ Related Problems

Full-Text Search Elasticsearch Meilisearch & Typesense

🤖 AI Agent

Confidence: Low False Positives: Medium ✗ Manual fix Fix: High Context: File Tests: Update

References

https://en.wikipedia.org/wiki/Okapi_BM25