← CodeClarityLab Home
Browse by Category
+ added · updated 7d
← Back to glossary

Regex Performance

regex Advanced
debt(d8/e3/b3/t8)
d8 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9), slightly better at d8 because specialist tools like rxxr2 and semgrep can detect ReDoS patterns, but they're rarely run in default pipelines and catastrophic backtracking typically only manifests when a malicious input arrives in production.

e3 Effort Remediation debt — work required to fix once spotted

Closest to 'simple parameterised fix' (e3) — quick_fix says swap nested quantifiers for possessive quantifiers or atomic groups, anchor the pattern, or use non-capturing groups. Each is a localized pattern replacement, not a refactor.

b3 Burden Structural debt — long-term weight of choosing wrong

Closest to 'localised tax' (b3) — applies_to is web/cli and the burden is per-regex; bad patterns don't shape the whole system but require ongoing vigilance wherever user input meets regex.

t8 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap' (t7), bumped to t8 because the misconception is exactly the canonical ReDoS trap — (a+)+ looks trivially simple but is O(2ⁿ). A competent dev without prior exposure will write the catastrophic form and never suspect it until a crafted input pins CPU.

About DEBT scoring →

Also Known As

ReDoS catastrophic backtracking PCRE performance

TL;DR

Regex performance pitfalls — catastrophic backtracking (ReDoS), unnecessary captures, and poorly anchored patterns that scan more input than needed.

Explanation

PCRE uses backtracking to match — when a branch fails, it retries with different positions. Catastrophic backtracking occurs when many paths lead to failure: (a+)+ on a long non-matching string causes exponential backtracking. Optimisations: anchor patterns (^ and $), use possessive quantifiers (a++), atomic groups ((?>...)), avoid redundant captures (use (?:...)), compile patterns once. PHP caches compiled PCRE patterns in an in-process cache; the cache size is pcre.jit.

Diagram

flowchart TD
    subgraph Catastrophic_Backtracking
        EVIL[Pattern a+ a dollar<br/>input: aaaaaaaab]
        TRY[Regex tries all combinations<br/>exponential time O of 2^n]
        EVIL --> TRY --> HANG[Server hangs or times out]
    end
    subgraph Fixes
        ATOMIC[Atomic groups<br/>no backtracking allowed]
        POSSESSIVE[Possessive quantifiers<br/>a++ greedy no give back]
        REWRITE[Rewrite pattern<br/>avoid ambiguity]
    end
    subgraph PHP_Tips
        LIMIT[pcre.backtrack_limit default 1M]
        TIMEOUT[preg_last_error check after match]
        UNICODE[u flag for Unicode<br/>slower but correct]
    end
style HANG fill:#f85149,color:#fff
style ATOMIC fill:#238636,color:#fff
style LIMIT fill:#d29922,color:#fff

Common Misconception

A regex that looks simple is always fast — (a+)+ looks simple but has O(2ⁿ) backtracking on non-matching input; regex complexity is about the backtracking graph, not the pattern length.

Why It Matters

A single poorly crafted regex on user input is a ReDoS vector — one request with a crafted input can consume 100% CPU for minutes, effectively DoS-ing the server.

Common Mistakes

  • Nested quantifiers: (a+)+, (a|aa)+, (x+x+)+ — all cause catastrophic backtracking.
  • Not anchoring validation patterns — /\d{10}/ scans the entire string; /^\d{10}$/ stops early.
  • Unnecessary capturing groups — (...) allocates memory and slows matching; use (?:...) for grouping without capturing.
  • User input in regex without validation — use preg_quote() and limit input length before matching.

Code Examples

✗ Vulnerable
// Catastrophic backtracking — ReDoS vulnerability:
$pattern = '/^(a+)+$/';
preg_match($pattern, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab'); // Hangs!
// Each 'a' can be grouped in 2^n ways — exponential backtracking

// Unnecessary captures slow down matching:
preg_match('/([a-z]+)@([a-z]+)\.([a-z]+)/', $email, $m); // 3 unnecessary groups
✓ Fixed
// Possessive quantifier — no backtracking:
$pattern = '/^(a++)$/';
// ++ = possessive: once matched, never gives back — no catastrophic backtracking

// Non-capturing groups — faster:
preg_match('/(?:[a-z]+)@(?:[a-z]+)\.(?:[a-z]+)/', $email);

// Limit input length before regex:
if (strlen($userInput) > 200) die('Too long');
preg_match($pattern, $userInput);

Added 15 Mar 2026
Edited 22 Mar 2026
Views 28
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings T 2 pings F 0 pings S 1 ping S 1 ping M 0 pings T 0 pings W 0 pings T 1 ping F 1 ping S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 1 ping F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 2 pings F 0 pings S 0 pings S 0 pings M 0 pings T 1 ping W 0 pings T 0 pings F
No pings yet today
No pings yesterday
Amazonbot 8 Perplexity 6 Unknown AI 3 Ahrefs 2 Google 2 Bing 1
crawler 20 crawler_json 1 pre-tracking 1
DEV INTEL Tools & Severity
🟠 High ⚙ Fix effort: Medium
⚡ Quick Fix
Avoid nested quantifiers like (a+)+ which cause catastrophic backtracking — use possessive quantifiers (a++) or atomic groups (?>a+) to prevent ReDoS; test with long inputs before production
📦 Applies To
any web cli
🔗 Prerequisites
🔍 Detection Hints
Nested quantifiers (a+)+ or (a*)*; alternation with common prefix (foo|foobar); no backtrack limit configuration
Auto-detectable: ✓ Yes rxxr2 regex101 semgrep
⚠ Related Problems
🤖 AI Agent
Confidence: High False Positives: Medium ✗ Manual fix Fix: Medium Context: Line Tests: Update
CWE-1333 CWE-400

✓ schema.org compliant