← Home ← Codex ← DEBT
Browse by Category
+ added · updated 7d
← Back to glossary

String Algorithms

Algorithms Advanced
debt(d7/e3/b3/t7)
d7 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'only careful code review or runtime testing' (d7). The detection_hints note automated=no, and while Blackfire (profiler) and PHPStan are listed, neither will reliably flag a naive O(n²) string loop versus a built-in call without targeted custom rules. Performance regressions from misuse typically only surface under load testing or profiling — not in routine linting or static analysis.

e3 Effort Remediation debt — work required to fix once spotted

Closest to 'simple parameterised fix (replace pattern with safer alternative)' (e3). The quick_fix is essentially swapping a custom implementation for a native PHP built-in (strpos, strrev, mb_strpos). This is a localised pattern-replacement refactor within the affected function or file, not a one-liner but not cross-cutting either.

b3 Burden Structural debt — long-term weight of choosing wrong

Closest to 'localised tax' (b3). The misuse (e.g. levenshtein() in a loop or reimplementing strpos()) is typically isolated to a search or comparison routine. It imposes a performance cost that is real but bounded — it doesn't spread across the codebase structurally unless the bad pattern is copy-pasted widely.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap (contradicts how a similar concept works elsewhere)' (t7). The canonical misconception is that PHP's strpos() is naive — developers from other backgrounds may assume native string functions are slow and reimplement them in PHP, producing code 100× slower. Additionally, strpos() silently misbehaves on multibyte strings (byte vs character positions), which is a well-documented but frequently missed gotcha. These two traps together push the score to t7.

About DEBT scoring →

Also Known As

KMP Boyer-Moore Rabin-Karp Levenshtein string search

TL;DR

Efficient string searching (KMP, Boyer-Moore, Rabin-Karp), edit distance (Levenshtein), and compression algorithms — foundational for search, diff tools, and bioinformatics.

Explanation

Naive string search: O(n×m). KMP (Knuth-Morris-Pratt): O(n+m) — preprocessing the pattern avoids re-scanning matched characters. Boyer-Moore: O(n/m) in best case — scans right to left, skips large portions. Rabin-Karp: O(n+m) average — uses rolling hash, natural for multi-pattern search and plagiarism detection. Levenshtein distance: minimum edit distance between two strings — used in fuzzy search and spell checkers. In PHP: strpos() uses optimised native search, similar_text() and levenshtein() for fuzzy matching.

Common Misconception

PHP's strpos() is naive string search — PHP uses highly optimised native algorithms; avoid reimplementing strpos() in PHP — the native C implementation is orders of magnitude faster.

Why It Matters

Understanding string algorithm complexity explains why running levenshtein() against a million records is impractical and why search engines use inverted indexes instead of string scanning.

Common Mistakes

  • levenshtein() in a loop against all database rows — O(n×m×rows); use a search engine with fuzzy indexing.
  • Reimplementing strpos() in PHP — native function is C-optimised; PHP implementation will be 100× slower.
  • Not using mb_strpos() for Unicode strings — strpos() operates on bytes and returns wrong positions for multibyte chars.
  • similar_text() for ranking — it is O(max(n,m)³); not suitable for large-scale text comparison.

Code Examples

✗ Vulnerable
// Levenshtein against all rows — O(n × string_length × row_count):
$query = 'phyton'; // Misspelled
$allTerms = $db->query('SELECT slug, term FROM glossary')->fetchAll();
$matches = [];
foreach ($allTerms as $term) {
    if (levenshtein($query, strtolower($term['term'])) <= 2) {
        $matches[] = $term; // O(m) per row, all rows scanned
    }
}
✓ Fixed
// Use search engine fuzzy query instead:
$results = $elasticsearch->search([
    'query' => ['match' => ['term' => ['query' => $query, 'fuzziness' => 'AUTO']]]
]);
// O(log n) with indexed fuzzy matching

// Or PostgreSQL pg_trgm:
// SELECT * FROM glossary WHERE term % 'phyton' ORDER BY term <-> 'phyton' LIMIT 10;
// Index: CREATE INDEX ON glossary USING gin(term gin_trgm_ops);

Added 15 Mar 2026
Edited 22 Mar 2026
Views 49
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings T 0 pings W 1 ping T 0 pings F 0 pings S 0 pings S 0 pings M 1 ping T 0 pings W 0 pings T 1 ping F 2 pings S 1 ping S 2 pings M 0 pings T 2 pings W 0 pings T 1 ping F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 1 ping S 0 pings S 0 pings M 0 pings T 0 pings W
No pings yet today
No pings yesterday
Perplexity 8 Amazonbot 7 Ahrefs 4 Google 4 Unknown AI 4 Scrapy 4 SEMrush 3 ChatGPT 2 Claude 1 Meta AI 1 PetalBot 1
crawler 35 crawler_json 3 pre-tracking 1
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: Medium
⚡ Quick Fix
Use strpos() for simple substring search (O(n)), built-in string functions for common operations, and PCRE regex only when pattern matching is genuinely needed — avoid O(n²) naive string algorithms
📦 Applies To
any web cli
🔗 Prerequisites
🔍 Detection Hints
Manual character-by-character string processing in PHP that built-ins handle; O(n²) substring search instead of strpos; naive string reversal when strrev() exists
Auto-detectable: ✗ No blackfire phpstan
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: High ✗ Manual fix Fix: Medium Context: Function Tests: Update


✓ schema.org compliant