← CodeClarityLab Home
Browse by Category
+ added · updated 7d
← Back to glossary

Fuzzy Search

search Intermediate

Also Known As

typo tolerance Levenshtein distance approximate matching edit distance

TL;DR

Matching strings that are similar but not identical — tolerating typos, transpositions, and misspellings using edit distance algorithms.

Explanation

Fuzzy search uses edit distance (Levenshtein distance): the minimum number of single-character edits to transform one string to another. Distance 1 matches one typo; distance 2 matches two. Elasticsearch's fuzzy query and Meilisearch/Typesense's built-in typo tolerance handle this automatically. For PHP, similar_text() and levenshtein() compute distances. Trigram indexes (PostgreSQL pg_trgm) enable fuzzy matching with database indexes.

Diagram

flowchart LR
    QUERY[User types phyton] --> FUZZY{Fuzzy matching}
    FUZZY -->|Levenshtein distance| EDIT[Edit distance = 1<br/>1 char different]
    EDIT --> MATCH[Matches: python]
    subgraph Trigram Similarity
        TRI[Split into trigrams<br/>php = _ph ph_ php]
        OVER[Overlap score<br/>phyton vs python = 0.71]
        TRI --> OVER --> RESULT[Ranked matches]
    end
    subgraph Phonetic
        SOUND[Soundex Metaphone<br/>similar sounding words]
    end
    subgraph Tools
        MEIL[Meilisearch - built-in typo tolerance]
        PG[PostgreSQL pg_trgm extension]
        ES[Elasticsearch fuzzy query]
    end
style MATCH fill:#238636,color:#fff
style RESULT fill:#238636,color:#fff
style MEIL fill:#1f6feb,color:#fff

Common Misconception

Fuzzy search matches everything loosely — good fuzzy search is calibrated to distance 1-2, which matches real typos without matching semantically unrelated words.

Why It Matters

Users typo queries — 'seach' for 'search', 'recieve' for 'receive' — without fuzzy matching, they see zero results for a query you can serve; fuzzy matching converts failed searches to successful ones.

Common Mistakes

  • Fuzzy distance too high — distance 3+ matches too many unrelated terms, reducing relevance.
  • Fuzzy matching on every field — apply fuzzy only to text fields, not IDs or structured data.
  • Not using AUTO fuzziness — Elasticsearch's AUTO:3,6 applies no fuzziness for short terms, distance 1 for 3-5 chars, distance 2 for 6+ chars.
  • Levenshtein in PHP application code on every row — O(n) for n documents; use indexed fuzzy search.

Code Examples

✗ Vulnerable
// PHP Levenshtein on all rows — O(n), unusable at scale:
$query = 'seach';
$results = $db->query('SELECT * FROM products')->fetchAll();
$fuzzyResults = array_filter($results, function($product) use ($query) {
    return levenshtein($query, strtolower($product['name'])) <= 2;
});
// Scans all products in PHP — not viable for large datasets
✓ Fixed
// Elasticsearch fuzzy query — indexed, fast:
$query = [
    'query' => [
        'match' => [
            'name' => [
                'query' => $searchTerm,
                'fuzziness' => 'AUTO',     // AUTO:3,6 — sensible defaults
                'prefix_length' => 2,       // First 2 chars must match exactly
            ]
        ]
    ]
];

// PostgreSQL pg_trgm for simpler setups:
// CREATE INDEX idx_products_name_trgm ON products USING gin(name gin_trgm_ops);
// SELECT * FROM products WHERE name % 'seach' ORDER BY name <-> 'seach' LIMIT 10;

Added 15 Mar 2026
Edited 22 Mar 2026
Views 24
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings W 0 pings T 0 pings F 2 pings S 0 pings S 1 ping M 0 pings T 0 pings W 0 pings T 1 ping F 0 pings S 1 ping S 0 pings M 0 pings T 0 pings W 0 pings T 1 ping F 1 ping S 0 pings S 0 pings M 0 pings T 1 ping W 0 pings T 3 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T
No pings yet today
No pings yesterday
Amazonbot 8 Perplexity 6 Google 4 Unknown AI 3 Ahrefs 2
crawler 19 crawler_json 3 pre-tracking 1
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: Medium
⚡ Quick Fix
Enable fuzzy search in Meilisearch (it's on by default) or use Levenshtein distance for simple PHP implementations — never use LIKE '%term%' which can't do fuzzy matching
📦 Applies To
any web api
🔗 Prerequisites
🔍 Detection Hints
Exact match only search with no typo tolerance; users getting no results for misspelled queries; soundex() levenshtein() in MySQL per-row scan
Auto-detectable: ✗ No meilisearch elasticsearch typesense
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: Medium ✗ Manual fix Fix: Medium Context: File Tests: Update

✓ schema.org compliant