← CodeClarityLab Home
Browse by Category
+ added · updated 7d
← Back to glossary

Search Relevance — TF-IDF & BM25

search Advanced

Also Known As

TF-IDF BM25 search ranking relevance scoring

TL;DR

Ranking algorithms that score documents by how relevant they are to a query — TF-IDF and BM25 balance term frequency against document length to surface the best matches.

Explanation

TF-IDF (Term Frequency-Inverse Document Frequency): TF measures how often a term appears in a document; IDF penalises terms common across all documents. Product gives relevance score. BM25 (Best Match 25): improves TF-IDF with a saturation function (additional occurrences matter less) and length normalisation (longer documents don't get unfair boosts). Elasticsearch and Typesense use BM25 by default. For PHP: Meilisearch and Typesense provide BM25 out of the box; Elasticsearch has full scoring control.

Common Misconception

More occurrences of a search term always means more relevant — BM25's saturation function means the 10th occurrence of a word adds almost no relevance; BM25 prevents keyword stuffing from gaming rankings.

Why It Matters

A PHP documentation search that ranks a 10,000-word tutorial above a precise 200-word reference page just because the tutorial mentions the keyword more is wrong — BM25 length normalisation prevents this.

Common Mistakes

  • LIKE '%keyword%' for full-text search — returns all matches with no relevance ranking.
  • Not boosting title matches over body — title occurrences should outweigh body occurrences.
  • Ignoring stop words — 'the', 'a', 'in' have near-zero IDF; not filtering them wastes index space.
  • No stemming — 'running' and 'ran' treated as different terms without a stemmer.

Code Examples

✗ Vulnerable
// No relevance ranking — all matches equal:
SELECT * FROM docs WHERE body LIKE '%php array%';
// Returns 1000 rows, no ordering by relevance
// User must scan all results
✓ Fixed
// PostgreSQL full-text search with ranking:
SELECT title, ts_rank(tsv, query) AS rank
FROM docs,
     to_tsquery('english', 'php & array') query
WHERE tsv @@ query
ORDER BY rank DESC
LIMIT 20;

// Or Meilisearch (BM25, typo-tolerant, PHP SDK):
$results = $index->search('php array', [
    'limit' => 20,
    'attributesToHighlight' => ['title', 'body'],
]);

Added 16 Mar 2026
Edited 22 Mar 2026
Views 30
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 1 ping F 3 pings S 1 ping S 1 ping M 0 pings T 1 ping W 1 ping T 1 ping F 1 ping S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T
No pings yet today
No pings yesterday
Perplexity 10 Amazonbot 7 Google 6 Ahrefs 2 ChatGPT 2 SEMrush 2
crawler 27 crawler_json 2
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: High
⚡ Quick Fix
Configure search result ranking explicitly — boost recent items, exact matches, and popular items over partial matches; measure click-through rate to validate your ranking formula
📦 Applies To
any web api
🔗 Prerequisites
🔍 Detection Hints
Default ranking without tuning; outdated content ranking above recent; title matches not boosted over body matches
Auto-detectable: ✗ No meilisearch elasticsearch algolia
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: Medium ✗ Manual fix Fix: High Context: File Tests: Update

✓ schema.org compliant