When should you NOT use Search Relevance Tuning?

The corpus is tiny and any reasonable ranking already satisfies users — tuning adds maintenance with no payoff. You have no query logs or relevance judgements yet, so changes cannot be measured and risk silent regressions. The real problem is missing recall (documents not matching at all), which is an indexing or analysis fix, not query-time tuning.

When is Search Relevance Tuning the right choice?

Users report the right result exists but ranks too low for common queries. Business signals like recency, popularity, or in-stock status should influence ordering. You have judged query sets or A/B infrastructure to measure ranking changes objectively. Exact title matches need to outrank incidental body matches across the catalogue.

← Back to glossary

Search Relevance Tuning

search PHP 7.4+ Advanced

debt(d9/e3/b3/t7)

d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9). detection_hints.automated is 'no'; the code_pattern (equal field weighting, no boosts/weights) is detectable as a pattern but the actual relevance impact is silent — users land on page two, conversion drops, and nothing in meilisearch/typesense/elasticsearch flags it. Poor ranking only surfaces through user behaviour or query logs in production.

e3 Effort Remediation debt — work required to fix once spotted

Closest to 'simple parameterised fix' (e3). quick_fix is changing one ranking signal (field weight or boost) at a time and measuring — these are query-time parameters, not code refactors. The misconception confirms no re-indexing is usually needed; it's a parameterised tuning loop within the search query construction.

b3 Burden Structural debt — long-term weight of choosing wrong

Closest to 'localised tax' (b3). applies_to is search contexts (web/api/library); the tuning lives in the query layer of the search component. It doesn't reshape the whole codebase, but stacked boosts and untracked signals (per common_mistakes) impose an ongoing tax on the search subsystem specifically.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap' (t7). The misconception — believing relevance is fixed by the engine's default scoring and requires re-indexing — directly contradicts how ranking actually works (most adjustments are query-time). A developer guesses wrong about the entire remediation path, plus common_mistakes show non-obvious regressions (tuning one query silently regresses others).

About DEBT scoring → scored by claude-opus-4-8 · 2026-06-04 · reviewed by human

Also Known As

relevance tuning ranking tuning query boosting score tuning

TL;DR

Adjusting query-time ranking — field weights, boosts, and scoring functions — so the most useful documents rank highest for real queries.

Explanation

Search relevance tuning is the practice of shaping query-time ranking without changing the underlying index or content. The index produces a base relevance score (usually BM25), but raw lexical scores rarely match user intent. Tuning layers signals on top of that score: per-field weights (title matches count more than body matches), query-time boosts (boost recent documents, in-stock products, or canonical pages), and custom scoring functions (decay by age, multiply by popularity, add a function score for distance). Most engines expose these as query parameters rather than index settings, so you can iterate on ranking quickly: Elasticsearch has function_score, field boosts (title^3), and rescore windows; Meilisearch has ranking rules and sortable attributes; Typesense has weights and sort_by expressions. Good tuning is data-driven. You start from real query logs, identify queries where the wanted result ranks low, and adjust signals while watching offline metrics (NDCG, MRR, precision@k) and online metrics (click-through rate, zero-result rate, conversion). The key discipline is changing one signal at a time and measuring, because boosts interact non-linearly: a strong recency boost can bury the single perfect old document. Over-tuning to a handful of pet queries causes regressions elsewhere, so a held-out judgement set or A/B test is essential. Tuning is also continuous — query patterns, catalogue, and user expectations drift, so relevance is a process, not a one-off configuration. Distinguish tuning from indexing: synonyms, stemming, and tokenisation are index-time concerns; weights, boosts, function scoring, and re-ranking are query-time concerns. This term covers the query-time half.

Common Misconception

✗ Relevance is fixed by the search engine's default scoring and you cannot influence it without re-indexing — in reality most ranking adjustments are query-time parameters (boosts, field weights, function scores) that need no index changes.

Why It Matters

Poor ranking sends users to page two or to a competitor even when the right document is indexed; tuning weights and boosts can lift click-through and conversion with no content changes.

Common Mistakes

Tuning blindly without query logs or relevance judgements, so improvements for one query silently regress others.
Stacking many boosts at once, making it impossible to tell which signal caused a ranking change.
Using an extreme recency or popularity boost that overwhelms text relevance and buries the best match.
Confusing index-time concerns (synonyms, stemming) with query-time tuning and re-indexing when a boost would do.
Optimising only offline NDCG without validating against real user behaviour via A/B testing.

Avoid When

The corpus is tiny and any reasonable ranking already satisfies users — tuning adds maintenance with no payoff.
You have no query logs or relevance judgements yet, so changes cannot be measured and risk silent regressions.
The real problem is missing recall (documents not matching at all), which is an indexing or analysis fix, not query-time tuning.

When To Use

Users report the right result exists but ranks too low for common queries.
Business signals like recency, popularity, or in-stock status should influence ordering.
You have judged query sets or A/B infrastructure to measure ranking changes objectively.
Exact title matches need to outrank incidental body matches across the catalogue.

Code Examples

✗ Vulnerable

// Flat query — every field weighted equally, no boosts:
$results = $client->search('glossary', [
    'q'    => $query,
    'query_by' => 'term,short,long,category',
]);
// A keyword buried in the long body outranks an exact title match.
// No recency, no popularity, no field priority — ranking is arbitrary.

✓ Fixed

// Query-time tuning: weighted fields + sort signals.
// Typesense example — title matters most, then short, then body.
$results = $client->collections['glossary']->documents->search([
    'q'                   => $query,
    'query_by'            => 'term,short,long',
    'query_by_weights'    => '6,3,1',          // field boosts at query time
    'sort_by'             => '_text_match:desc,popularity:desc', // tie-break by popularity
    'prioritize_exact_match' => true,
]);
// Iterate: adjust weights, re-run against a judged query set,
// compare NDCG@10, keep the change only if it improves.