← Back to glossary

BM25 Ranking

Search Intermediate

debt(d9/e3/b5/t5)

d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9). Misconfigured BM25 parameters (wrong k1/b values, using FTS4 instead of FTS5, comparing scores cross-query) produce no errors or warnings — the search engine runs and returns results. Only degraded search relevance experienced by real users reveals the problem, and even then it requires deliberate measurement with a relevance evaluation dataset to confirm. No detection_hints.tools are specified; no tooling in the search ecosystem flags suboptimal BM25 parameter choices automatically.

e3 Effort Remediation debt — work required to fix once spotted

Closest to 'simple parameterised fix' (e3). The quick_fix confirms Elasticsearch uses BM25 by default (no config needed) and parameter tuning is a matter of adjusting k1 and b values. Switching SQLite from FTS4 to FTS5 is a schema migration touching the table definition and index creation, slightly more than a one-liner but still localised to one component. The remediation pattern is clear and bounded — hence e3 rather than e1.

b5 Burden Structural debt — long-term weight of choosing wrong

Closest to 'persistent productivity tax' (b5). BM25 is the relevance backbone of every search feature in the application. Choosing wrong parameter defaults or the wrong FTS version affects every search query across the system. However, it doesn't reshape every unrelated change the way an ORM or auth system would — it stays localised to the search subsystem. Teams building search features persistently feel the tax of needing a relevance evaluation dataset and measurement discipline, placing this at b5.

t5 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'notable trap — a documented gotcha most devs eventually learn' (t5). The misconception field explicitly states developers believe BM25 and TF-IDF are interchangeable and produce equivalent results. Additionally, common_mistakes highlight that BM25 scores are not comparable across queries (a non-obvious behaviour), and that defaults are assumed optimal when they are not. These are well-documented gotchas in search engineering that most developers learn after encountering degraded relevance, fitting t5.

About DEBT scoring → scored by claude-sonnet-4-6 · 2026-05-06 · reviewed by human

Also Known As

BM25 Okapi BM25 Best Match 25 BM25F bm25 ranking

TL;DR

Best Match 25 — the industry-standard relevance ranking algorithm used by Elasticsearch, Lucene, and SQLite FTS5, refining TF-IDF with better document length normalisation and a term frequency saturation parameter.

Explanation

BM25 (Okapi BM25) improves on TF-IDF by adding two tuning parameters: k1 controls term frequency saturation (how much additional occurrences of a term increase the score — typically 1.2–2.0), and b controls length normalisation (how strongly document length affects scoring — 0.75 is standard). The key insight over TF-IDF: in BM25, each additional occurrence of a term contributes diminishing returns to the score. A term appearing 100 times in a document does not score 100× higher than a term appearing once — there is a saturation ceiling. This makes BM25 less susceptible to term-stuffing and more accurate on documents of varying lengths. BM25 is the default ranking function in Elasticsearch 5+, Lucene, Solr, SQLite FTS5, and PostgreSQL's ts_rank_cd variant.

Watch Out

⚠ BM25 parameters (k1 and b) are not universal: tuning that works for short product titles will often hurt recall on long technical documents, and the default b=0.75 assumes English-like length distributions — query languages, code snippets, or highly variable corpora may need re-tuning or custom BM25 variants.

Common Misconception

✗ BM25 and TF-IDF produce the same results and are interchangeable. BM25 consistently outperforms TF-IDF on real-world document collections, particularly on short queries against long documents. The term frequency saturation in BM25 prevents long documents from dominating results purely due to higher raw term counts. On modern search engines, TF-IDF is largely a historical reference point — BM25 is the practical baseline.

Why It Matters

BM25 is the default relevance algorithm in every major search engine and understanding it prevents cargo-cult configuration. When tuning Elasticsearch for a PHP application, the k1 and b parameters directly control search quality — lowering b reduces length normalisation bias for collections with consistent document lengths; raising k1 rewards documents where the query term appears repeatedly. Knowing what these parameters do is the difference between systematic relevance tuning and random experimentation.

Common Mistakes

Assuming BM25 scores are directly comparable across different indices or document collections — scores are relative within a corpus and depend on the corpus statistics, not absolute.
Tuning k1 and b parameters without understanding the trade-off: lowering b aggressively to ignore document length can hurt ranking quality if documents legitimately vary in relevant content density.
Confusing BM25 with vector embeddings or semantic search — BM25 is lexical and keyword-based, not semantic, so it fails on synonyms and conceptual relevance without query expansion.
Setting k1 too high (> 2.5) expecting more term-frequency sensitivity, then wondering why repeated keyword spam still ranks too high — the point of k1 is to cap that effect, not amplify it.

Code Examples

✗ Vulnerable

// ❌ Hand-rolling relevance scoring with raw LIKE — no IDF weighting
function search(string $query, PDO $db): array {
    $words = explode(' ', $query);
    $sql = "SELECT *, 0 AS score FROM documents WHERE ";
    $conditions = [];
    foreach ($words as $word) {
        $conditions[] = "content LIKE '%$word%'";
    }
    $sql .= implode(' OR ', $conditions);
    // Counts nothing, ranks nothing, vulnerable to SQL injection
    return $db->query($sql)->fetchAll();
}

✓ Fixed

// ✅ Use Elasticsearch (BM25 by default since v5) or PostgreSQL FTS
// Elasticsearch — BM25 automatic, no config needed
$results = $es->search([
    'index' => 'articles',
    'body'  => [
        'query' => [
            'multi_match' => [
                'query'  => $userQuery,
                'fields' => ['title^3', 'body'], // ^3 boosts title matches
            ]
        ]
    ]
]);

// PostgreSQL FTS — ts_rank uses BM25-like IDF weighting
$stmt = $pdo->prepare("
    SELECT *, ts_rank(search_vector, plainto_tsquery('english', :q)) AS rank
    FROM articles
    WHERE search_vector @@ plainto_tsquery('english', :q)
    ORDER BY rank DESC
    LIMIT 20
");
$stmt->execute([':q' => $userQuery]);

References

https://www.elastic.co/guide/en/elasticsearch/reference/current/similarity.html