Search Relevance — TF-IDF & BM25
Also Known As
TF-IDF
BM25
search ranking
relevance scoring
TL;DR
Ranking algorithms that score documents by how relevant they are to a query — TF-IDF and BM25 balance term frequency against document length to surface the best matches.
Explanation
TF-IDF (Term Frequency-Inverse Document Frequency): TF measures how often a term appears in a document; IDF penalises terms common across all documents. Product gives relevance score. BM25 (Best Match 25): improves TF-IDF with a saturation function (additional occurrences matter less) and length normalisation (longer documents don't get unfair boosts). Elasticsearch and Typesense use BM25 by default. For PHP: Meilisearch and Typesense provide BM25 out of the box; Elasticsearch has full scoring control.
Common Misconception
✗ More occurrences of a search term always means more relevant — BM25's saturation function means the 10th occurrence of a word adds almost no relevance; BM25 prevents keyword stuffing from gaming rankings.
Why It Matters
A PHP documentation search that ranks a 10,000-word tutorial above a precise 200-word reference page just because the tutorial mentions the keyword more is wrong — BM25 length normalisation prevents this.
Common Mistakes
- LIKE '%keyword%' for full-text search — returns all matches with no relevance ranking.
- Not boosting title matches over body — title occurrences should outweigh body occurrences.
- Ignoring stop words — 'the', 'a', 'in' have near-zero IDF; not filtering them wastes index space.
- No stemming — 'running' and 'ran' treated as different terms without a stemmer.
Code Examples
✗ Vulnerable
// No relevance ranking — all matches equal:
SELECT * FROM docs WHERE body LIKE '%php array%';
// Returns 1000 rows, no ordering by relevance
// User must scan all results
✓ Fixed
// PostgreSQL full-text search with ranking:
SELECT title, ts_rank(tsv, query) AS rank
FROM docs,
to_tsquery('english', 'php & array') query
WHERE tsv @@ query
ORDER BY rank DESC
LIMIT 20;
// Or Meilisearch (BM25, typo-tolerant, PHP SDK):
$results = $index->search('php array', [
'limit' => 20,
'attributesToHighlight' => ['title', 'body'],
]);
References
Tags
🤝 Adopt this term
£79/year · your link shown here
Added
16 Mar 2026
Edited
22 Mar 2026
Views
30
🤖 AI Guestbook educational data only
|
|
Last 30 days
Agents 0
No pings yet today
No pings yesterday
Perplexity 10
Amazonbot 7
Google 6
Ahrefs 2
ChatGPT 2
SEMrush 2
How they use it
crawler 27
crawler_json 2
Related categories
⚡
DEV INTEL
Tools & Severity
🟡 Medium
⚙ Fix effort: High
⚡ Quick Fix
Configure search result ranking explicitly — boost recent items, exact matches, and popular items over partial matches; measure click-through rate to validate your ranking formula
📦 Applies To
any
web
api
🔗 Prerequisites
🔍 Detection Hints
Default ranking without tuning; outdated content ranking above recent; title matches not boosted over body matches
Auto-detectable:
✗ No
meilisearch
elasticsearch
algolia
⚠ Related Problems
🤖 AI Agent
Confidence: Low
False Positives: Medium
✗ Manual fix
Fix: High
Context: File
Tests: Update