String Algorithms
debt(d7/e3/b3/t7)
Closest to 'only careful code review or runtime testing' (d7). The detection_hints note automated=no, and while Blackfire (profiler) and PHPStan are listed, neither will reliably flag a naive O(n²) string loop versus a built-in call without targeted custom rules. Performance regressions from misuse typically only surface under load testing or profiling — not in routine linting or static analysis.
Closest to 'simple parameterised fix (replace pattern with safer alternative)' (e3). The quick_fix is essentially swapping a custom implementation for a native PHP built-in (strpos, strrev, mb_strpos). This is a localised pattern-replacement refactor within the affected function or file, not a one-liner but not cross-cutting either.
Closest to 'localised tax' (b3). The misuse (e.g. levenshtein() in a loop or reimplementing strpos()) is typically isolated to a search or comparison routine. It imposes a performance cost that is real but bounded — it doesn't spread across the codebase structurally unless the bad pattern is copy-pasted widely.
Closest to 'serious trap (contradicts how a similar concept works elsewhere)' (t7). The canonical misconception is that PHP's strpos() is naive — developers from other backgrounds may assume native string functions are slow and reimplement them in PHP, producing code 100× slower. Additionally, strpos() silently misbehaves on multibyte strings (byte vs character positions), which is a well-documented but frequently missed gotcha. These two traps together push the score to t7.
Also Known As
TL;DR
Explanation
Naive string search: O(n×m). KMP (Knuth-Morris-Pratt): O(n+m) — preprocessing the pattern avoids re-scanning matched characters. Boyer-Moore: O(n/m) in best case — scans right to left, skips large portions. Rabin-Karp: O(n+m) average — uses rolling hash, natural for multi-pattern search and plagiarism detection. Levenshtein distance: minimum edit distance between two strings — used in fuzzy search and spell checkers. In PHP: strpos() uses optimised native search, similar_text() and levenshtein() for fuzzy matching.
Common Misconception
Why It Matters
Common Mistakes
- levenshtein() in a loop against all database rows — O(n×m×rows); use a search engine with fuzzy indexing.
- Reimplementing strpos() in PHP — native function is C-optimised; PHP implementation will be 100× slower.
- Not using mb_strpos() for Unicode strings — strpos() operates on bytes and returns wrong positions for multibyte chars.
- similar_text() for ranking — it is O(max(n,m)³); not suitable for large-scale text comparison.
Code Examples
// Levenshtein against all rows — O(n × string_length × row_count):
$query = 'phyton'; // Misspelled
$allTerms = $db->query('SELECT slug, term FROM glossary')->fetchAll();
$matches = [];
foreach ($allTerms as $term) {
if (levenshtein($query, strtolower($term['term'])) <= 2) {
$matches[] = $term; // O(m) per row, all rows scanned
}
}
// Use search engine fuzzy query instead:
$results = $elasticsearch->search([
'query' => ['match' => ['term' => ['query' => $query, 'fuzziness' => 'AUTO']]]
]);
// O(log n) with indexed fuzzy matching
// Or PostgreSQL pg_trgm:
// SELECT * FROM glossary WHERE term % 'phyton' ORDER BY term <-> 'phyton' LIMIT 10;
// Index: CREATE INDEX ON glossary USING gin(term gin_trgm_ops);