Regex Performance
debt(d8/e3/b3/t8)
Closest to 'silent in production until users hit it' (d9), slightly better at d8 because specialist tools like rxxr2 and semgrep can detect ReDoS patterns, but they're rarely run in default pipelines and catastrophic backtracking typically only manifests when a malicious input arrives in production.
Closest to 'simple parameterised fix' (e3) — quick_fix says swap nested quantifiers for possessive quantifiers or atomic groups, anchor the pattern, or use non-capturing groups. Each is a localized pattern replacement, not a refactor.
Closest to 'localised tax' (b3) — applies_to is web/cli and the burden is per-regex; bad patterns don't shape the whole system but require ongoing vigilance wherever user input meets regex.
Closest to 'serious trap' (t7), bumped to t8 because the misconception is exactly the canonical ReDoS trap — (a+)+ looks trivially simple but is O(2ⁿ). A competent dev without prior exposure will write the catastrophic form and never suspect it until a crafted input pins CPU.
Also Known As
TL;DR
Explanation
PCRE uses backtracking to match — when a branch fails, it retries with different positions. Catastrophic backtracking occurs when many paths lead to failure: (a+)+ on a long non-matching string causes exponential backtracking. Optimisations: anchor patterns (^ and $), use possessive quantifiers (a++), atomic groups ((?>...)), avoid redundant captures (use (?:...)), compile patterns once. PHP caches compiled PCRE patterns in an in-process cache; the cache size is pcre.jit.
Diagram
flowchart TD
subgraph Catastrophic_Backtracking
EVIL[Pattern a+ a dollar<br/>input: aaaaaaaab]
TRY[Regex tries all combinations<br/>exponential time O of 2^n]
EVIL --> TRY --> HANG[Server hangs or times out]
end
subgraph Fixes
ATOMIC[Atomic groups<br/>no backtracking allowed]
POSSESSIVE[Possessive quantifiers<br/>a++ greedy no give back]
REWRITE[Rewrite pattern<br/>avoid ambiguity]
end
subgraph PHP_Tips
LIMIT[pcre.backtrack_limit default 1M]
TIMEOUT[preg_last_error check after match]
UNICODE[u flag for Unicode<br/>slower but correct]
end
style HANG fill:#f85149,color:#fff
style ATOMIC fill:#238636,color:#fff
style LIMIT fill:#d29922,color:#fff
Common Misconception
Why It Matters
Common Mistakes
- Nested quantifiers: (a+)+, (a|aa)+, (x+x+)+ — all cause catastrophic backtracking.
- Not anchoring validation patterns — /\d{10}/ scans the entire string; /^\d{10}$/ stops early.
- Unnecessary capturing groups — (...) allocates memory and slows matching; use (?:...) for grouping without capturing.
- User input in regex without validation — use preg_quote() and limit input length before matching.
Code Examples
// Catastrophic backtracking — ReDoS vulnerability:
$pattern = '/^(a+)+$/';
preg_match($pattern, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab'); // Hangs!
// Each 'a' can be grouped in 2^n ways — exponential backtracking
// Unnecessary captures slow down matching:
preg_match('/([a-z]+)@([a-z]+)\.([a-z]+)/', $email, $m); // 3 unnecessary groups
// Possessive quantifier — no backtracking:
$pattern = '/^(a++)$/';
// ++ = possessive: once matched, never gives back — no catastrophic backtracking
// Non-capturing groups — faster:
preg_match('/(?:[a-z]+)@(?:[a-z]+)\.(?:[a-z]+)/', $email);
// Limit input length before regex:
if (strlen($userInput) > 200) die('Too long');
preg_match($pattern, $userInput);