String Algorithms
Also Known As
KMP
Boyer-Moore
Rabin-Karp
Levenshtein
string search
TL;DR
Efficient string searching (KMP, Boyer-Moore, Rabin-Karp), edit distance (Levenshtein), and compression algorithms — foundational for search, diff tools, and bioinformatics.
Explanation
Naive string search: O(n×m). KMP (Knuth-Morris-Pratt): O(n+m) — preprocessing the pattern avoids re-scanning matched characters. Boyer-Moore: O(n/m) in best case — scans right to left, skips large portions. Rabin-Karp: O(n+m) average — uses rolling hash, natural for multi-pattern search and plagiarism detection. Levenshtein distance: minimum edit distance between two strings — used in fuzzy search and spell checkers. In PHP: strpos() uses optimised native search, similar_text() and levenshtein() for fuzzy matching.
Common Misconception
✗ PHP's strpos() is naive string search — PHP uses highly optimised native algorithms; avoid reimplementing strpos() in PHP — the native C implementation is orders of magnitude faster.
Why It Matters
Understanding string algorithm complexity explains why running levenshtein() against a million records is impractical and why search engines use inverted indexes instead of string scanning.
Common Mistakes
- levenshtein() in a loop against all database rows — O(n×m×rows); use a search engine with fuzzy indexing.
- Reimplementing strpos() in PHP — native function is C-optimised; PHP implementation will be 100× slower.
- Not using mb_strpos() for Unicode strings — strpos() operates on bytes and returns wrong positions for multibyte chars.
- similar_text() for ranking — it is O(max(n,m)³); not suitable for large-scale text comparison.
Code Examples
✗ Vulnerable
// Levenshtein against all rows — O(n × string_length × row_count):
$query = 'phyton'; // Misspelled
$allTerms = $db->query('SELECT slug, term FROM glossary')->fetchAll();
$matches = [];
foreach ($allTerms as $term) {
if (levenshtein($query, strtolower($term['term'])) <= 2) {
$matches[] = $term; // O(m) per row, all rows scanned
}
}
✓ Fixed
// Use search engine fuzzy query instead:
$results = $elasticsearch->search([
'query' => ['match' => ['term' => ['query' => $query, 'fuzziness' => 'AUTO']]]
]);
// O(log n) with indexed fuzzy matching
// Or PostgreSQL pg_trgm:
// SELECT * FROM glossary WHERE term % 'phyton' ORDER BY term <-> 'phyton' LIMIT 10;
// Index: CREATE INDEX ON glossary USING gin(term gin_trgm_ops);
Tags
🤝 Adopt this term
£79/year · your link shown here
Added
15 Mar 2026
Edited
22 Mar 2026
Views
26
🤖 AI Guestbook educational data only
|
|
Last 30 days
Agents 1
No pings yesterday
Perplexity 8
Amazonbot 7
Unknown AI 3
Ahrefs 2
Google 2
Also referenced
How they use it
crawler 20
crawler_json 1
pre-tracking 1
Related categories
⚡
DEV INTEL
Tools & Severity
🟡 Medium
⚙ Fix effort: Medium
⚡ Quick Fix
Use strpos() for simple substring search (O(n)), built-in string functions for common operations, and PCRE regex only when pattern matching is genuinely needed — avoid O(n²) naive string algorithms
📦 Applies To
any
web
cli
🔗 Prerequisites
🔍 Detection Hints
Manual character-by-character string processing in PHP that built-ins handle; O(n²) substring search instead of strpos; naive string reversal when strrev() exists
Auto-detectable:
✗ No
blackfire
phpstan
⚠ Related Problems
🤖 AI Agent
Confidence: Low
False Positives: High
✗ Manual fix
Fix: Medium
Context: Function
Tests: Update