← Back to glossary

String Algorithms

Algorithms Advanced

debt(d7/e3/b3/t7)

d7 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'only careful code review or runtime testing' (d7). The detection_hints note automated=no, and while Blackfire (profiler) and PHPStan are listed, neither will reliably flag a naive O(n²) string loop versus a built-in call without targeted custom rules. Performance regressions from misuse typically only surface under load testing or profiling — not in routine linting or static analysis.

e3 Effort Remediation debt — work required to fix once spotted

Closest to 'simple parameterised fix (replace pattern with safer alternative)' (e3). The quick_fix is essentially swapping a custom implementation for a native PHP built-in (strpos, strrev, mb_strpos). This is a localised pattern-replacement refactor within the affected function or file, not a one-liner but not cross-cutting either.

b3 Burden Structural debt — long-term weight of choosing wrong

Closest to 'localised tax' (b3). The misuse (e.g. levenshtein() in a loop or reimplementing strpos()) is typically isolated to a search or comparison routine. It imposes a performance cost that is real but bounded — it doesn't spread across the codebase structurally unless the bad pattern is copy-pasted widely.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap (contradicts how a similar concept works elsewhere)' (t7). The canonical misconception is that PHP's strpos() is naive — developers from other backgrounds may assume native string functions are slow and reimplement them in PHP, producing code 100× slower. Additionally, strpos() silently misbehaves on multibyte strings (byte vs character positions), which is a well-documented but frequently missed gotcha. These two traps together push the score to t7.

About DEBT scoring → scored by claude-sonnet-4-6 · 2026-05-11 · reviewed by human

Also Known As

KMP Boyer-Moore Rabin-Karp Levenshtein string search

TL;DR

Efficient string searching (KMP, Boyer-Moore, Rabin-Karp), edit distance (Levenshtein), and compression algorithms — foundational for search, diff tools, and bioinformatics.

Explanation

Naive string search: O(n×m). KMP (Knuth-Morris-Pratt): O(n+m) — preprocessing the pattern avoids re-scanning matched characters. Boyer-Moore: O(n/m) in best case — scans right to left, skips large portions. Rabin-Karp: O(n+m) average — uses rolling hash, natural for multi-pattern search and plagiarism detection. Levenshtein distance: minimum edit distance between two strings — used in fuzzy search and spell checkers. In PHP: strpos() uses optimised native search, similar_text() and levenshtein() for fuzzy matching.

Common Misconception

✗ PHP's strpos() is naive string search — PHP uses highly optimised native algorithms; avoid reimplementing strpos() in PHP — the native C implementation is orders of magnitude faster.

Why It Matters

Understanding string algorithm complexity explains why running levenshtein() against a million records is impractical and why search engines use inverted indexes instead of string scanning.

Common Mistakes

levenshtein() in a loop against all database rows — O(n×m×rows); use a search engine with fuzzy indexing.
Reimplementing strpos() in PHP — native function is C-optimised; PHP implementation will be 100× slower.
Not using mb_strpos() for Unicode strings — strpos() operates on bytes and returns wrong positions for multibyte chars.
similar_text() for ranking — it is O(max(n,m)³); not suitable for large-scale text comparison.

Code Examples

✗ Vulnerable

// Levenshtein against all rows — O(n × string_length × row_count):
$query = 'phyton'; // Misspelled
$allTerms = $db->query('SELECT slug, term FROM glossary')->fetchAll();
$matches = [];
foreach ($allTerms as $term) {
    if (levenshtein($query, strtolower($term['term'])) <= 2) {
        $matches[] = $term; // O(m) per row, all rows scanned
    }
}

✓ Fixed

// Use search engine fuzzy query instead:
$results = $elasticsearch->search([
    'query' => ['match' => ['term' => ['query' => $query, 'fuzziness' => 'AUTO']]]
]);
// O(log n) with indexed fuzzy matching

// Or PostgreSQL pg_trgm:
// SELECT * FROM glossary WHERE term % 'phyton' ORDER BY term <-> 'phyton' LIMIT 10;
// Index: CREATE INDEX ON glossary USING gin(term gin_trgm_ops);

Tags

algorithms strings search

Added 15 Mar 2026

Edited 22 Mar 2026

Curated in Warsaw under one editorial standard. 1,506 terms, single voice. About this reference →

Rate this term

No ratings yet

🤖 AI Guestbook educational data only

| |

Last 30 days

Agents 1

Claude 1

No pings yesterday

Perplexity 8 Amazonbot 7 Ahrefs 4 Google 4 Unknown AI 4 Scrapy 4 SEMrush 3 Claude 2 ChatGPT 2 Meta AI 1 PetalBot 1

Also referenced

Big-O Notation 103 Fuzzy Search 53 Elasticsearch 49 Regex Performance 43

How they use it

crawler 36 crawler_json 3 pre-tracking 1

Related categories

algorithms 2.1k search 691 regex 685

⚡ DEV INTEL Tools & Severity

🟡 Medium ⚙ Fix effort: Medium

⚡ Quick Fix

Use strpos() for simple substring search (O(n)), built-in string functions for common operations, and PCRE regex only when pattern matching is genuinely needed — avoid O(n²) naive string algorithms

📦 Applies To

any web cli

🔗 Prerequisites

Big-O Notation PCRE in PHP

🔍 Detection Hints

Manual character-by-character string processing in PHP that built-ins handle; O(n²) substring search instead of strpos; naive string reversal when strrev() exists

Auto-detectable: ✗ No blackfire phpstan

⚠ Related Problems

Big-O Notation PCRE in PHP

🤖 AI Agent

Confidence: Low False Positives: High ✗ Manual fix Fix: Medium Context: Function Tests: Update

References

https://en.wikipedia.org/wiki/String-searching_algorithm