← CodeClarityLab Home
Browse by Category
+ added · updated 7d
← Back to glossary

String Algorithms

algorithms Advanced

Also Known As

KMP Boyer-Moore Rabin-Karp Levenshtein string search

TL;DR

Efficient string searching (KMP, Boyer-Moore, Rabin-Karp), edit distance (Levenshtein), and compression algorithms — foundational for search, diff tools, and bioinformatics.

Explanation

Naive string search: O(n×m). KMP (Knuth-Morris-Pratt): O(n+m) — preprocessing the pattern avoids re-scanning matched characters. Boyer-Moore: O(n/m) in best case — scans right to left, skips large portions. Rabin-Karp: O(n+m) average — uses rolling hash, natural for multi-pattern search and plagiarism detection. Levenshtein distance: minimum edit distance between two strings — used in fuzzy search and spell checkers. In PHP: strpos() uses optimised native search, similar_text() and levenshtein() for fuzzy matching.

Common Misconception

PHP's strpos() is naive string search — PHP uses highly optimised native algorithms; avoid reimplementing strpos() in PHP — the native C implementation is orders of magnitude faster.

Why It Matters

Understanding string algorithm complexity explains why running levenshtein() against a million records is impractical and why search engines use inverted indexes instead of string scanning.

Common Mistakes

  • levenshtein() in a loop against all database rows — O(n×m×rows); use a search engine with fuzzy indexing.
  • Reimplementing strpos() in PHP — native function is C-optimised; PHP implementation will be 100× slower.
  • Not using mb_strpos() for Unicode strings — strpos() operates on bytes and returns wrong positions for multibyte chars.
  • similar_text() for ranking — it is O(max(n,m)³); not suitable for large-scale text comparison.

Code Examples

✗ Vulnerable
// Levenshtein against all rows — O(n × string_length × row_count):
$query = 'phyton'; // Misspelled
$allTerms = $db->query('SELECT slug, term FROM glossary')->fetchAll();
$matches = [];
foreach ($allTerms as $term) {
    if (levenshtein($query, strtolower($term['term'])) <= 2) {
        $matches[] = $term; // O(m) per row, all rows scanned
    }
}
✓ Fixed
// Use search engine fuzzy query instead:
$results = $elasticsearch->search([
    'query' => ['match' => ['term' => ['query' => $query, 'fuzziness' => 'AUTO']]]
]);
// O(log n) with indexed fuzzy matching

// Or PostgreSQL pg_trgm:
// SELECT * FROM glossary WHERE term % 'phyton' ORDER BY term <-> 'phyton' LIMIT 10;
// Index: CREATE INDEX ON glossary USING gin(term gin_trgm_ops);

Added 15 Mar 2026
Edited 22 Mar 2026
Views 26
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings W 0 pings T 0 pings F 0 pings S 1 ping S 1 ping M 0 pings T 0 pings W 1 ping T 0 pings F 0 pings S 0 pings S 1 ping M 0 pings T 0 pings W 1 ping T 0 pings F 0 pings S 0 pings S 1 ping M 0 pings T 0 pings W 1 ping T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 1 ping T
No pings yesterday
Perplexity 8 Amazonbot 7 Unknown AI 3 Ahrefs 2 Google 2
crawler 20 crawler_json 1 pre-tracking 1
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: Medium
⚡ Quick Fix
Use strpos() for simple substring search (O(n)), built-in string functions for common operations, and PCRE regex only when pattern matching is genuinely needed — avoid O(n²) naive string algorithms
📦 Applies To
any web cli
🔗 Prerequisites
🔍 Detection Hints
Manual character-by-character string processing in PHP that built-ins handle; O(n²) substring search instead of strpos; naive string reversal when strrev() exists
Auto-detectable: ✗ No blackfire phpstan
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: High ✗ Manual fix Fix: Medium Context: Function Tests: Update

✓ schema.org compliant