Search Relevance Tuning
debt(d9/e3/b3/t7)
Closest to 'silent in production until users hit it' (d9). detection_hints.automated is 'no'; the code_pattern (equal field weighting, no boosts/weights) is detectable as a pattern but the actual relevance impact is silent — users land on page two, conversion drops, and nothing in meilisearch/typesense/elasticsearch flags it. Poor ranking only surfaces through user behaviour or query logs in production.
Closest to 'simple parameterised fix' (e3). quick_fix is changing one ranking signal (field weight or boost) at a time and measuring — these are query-time parameters, not code refactors. The misconception confirms no re-indexing is usually needed; it's a parameterised tuning loop within the search query construction.
Closest to 'localised tax' (b3). applies_to is search contexts (web/api/library); the tuning lives in the query layer of the search component. It doesn't reshape the whole codebase, but stacked boosts and untracked signals (per common_mistakes) impose an ongoing tax on the search subsystem specifically.
Closest to 'serious trap' (t7). The misconception — believing relevance is fixed by the engine's default scoring and requires re-indexing — directly contradicts how ranking actually works (most adjustments are query-time). A developer guesses wrong about the entire remediation path, plus common_mistakes show non-obvious regressions (tuning one query silently regresses others).
Also Known As
TL;DR
Explanation
Search relevance tuning is the practice of shaping query-time ranking without changing the underlying index or content. The index produces a base relevance score (usually BM25), but raw lexical scores rarely match user intent. Tuning layers signals on top of that score: per-field weights (title matches count more than body matches), query-time boosts (boost recent documents, in-stock products, or canonical pages), and custom scoring functions (decay by age, multiply by popularity, add a function score for distance). Most engines expose these as query parameters rather than index settings, so you can iterate on ranking quickly: Elasticsearch has function_score, field boosts (title^3), and rescore windows; Meilisearch has ranking rules and sortable attributes; Typesense has weights and sort_by expressions. Good tuning is data-driven. You start from real query logs, identify queries where the wanted result ranks low, and adjust signals while watching offline metrics (NDCG, MRR, precision@k) and online metrics (click-through rate, zero-result rate, conversion). The key discipline is changing one signal at a time and measuring, because boosts interact non-linearly: a strong recency boost can bury the single perfect old document. Over-tuning to a handful of pet queries causes regressions elsewhere, so a held-out judgement set or A/B test is essential. Tuning is also continuous — query patterns, catalogue, and user expectations drift, so relevance is a process, not a one-off configuration. Distinguish tuning from indexing: synonyms, stemming, and tokenisation are index-time concerns; weights, boosts, function scoring, and re-ranking are query-time concerns. This term covers the query-time half.
Common Misconception
Why It Matters
Common Mistakes
- Tuning blindly without query logs or relevance judgements, so improvements for one query silently regress others.
- Stacking many boosts at once, making it impossible to tell which signal caused a ranking change.
- Using an extreme recency or popularity boost that overwhelms text relevance and buries the best match.
- Confusing index-time concerns (synonyms, stemming) with query-time tuning and re-indexing when a boost would do.
- Optimising only offline NDCG without validating against real user behaviour via A/B testing.
Avoid When
- The corpus is tiny and any reasonable ranking already satisfies users — tuning adds maintenance with no payoff.
- You have no query logs or relevance judgements yet, so changes cannot be measured and risk silent regressions.
- The real problem is missing recall (documents not matching at all), which is an indexing or analysis fix, not query-time tuning.
When To Use
- Users report the right result exists but ranks too low for common queries.
- Business signals like recency, popularity, or in-stock status should influence ordering.
- You have judged query sets or A/B infrastructure to measure ranking changes objectively.
- Exact title matches need to outrank incidental body matches across the catalogue.
Code Examples
// Flat query — every field weighted equally, no boosts:
$results = $client->search('glossary', [
'q' => $query,
'query_by' => 'term,short,long,category',
]);
// A keyword buried in the long body outranks an exact title match.
// No recency, no popularity, no field priority — ranking is arbitrary.
// Query-time tuning: weighted fields + sort signals.
// Typesense example — title matters most, then short, then body.
$results = $client->collections['glossary']->documents->search([
'q' => $query,
'query_by' => 'term,short,long',
'query_by_weights' => '6,3,1', // field boosts at query time
'sort_by' => '_text_match:desc,popularity:desc', // tie-break by popularity
'prioritize_exact_match' => true,
]);
// Iterate: adjust weights, re-run against a judged query set,
// compare NDCG@10, keep the change only if it improves.