← Home ← Codex ← DEBT
Browse by Category
+ added · updated 7d
← Back to glossary

Regex Branch Reset Groups

regex Advanced
debt(d7/e3/b2/t6)
d7 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'only careful code review or runtime testing' (d7), since detection_hints.automated is 'no' and only regex101 is listed — the smell (manual disambiguation of $m[N] across alternation branches) is visible only on review or via runtime regex testing.

e3 Effort Remediation debt — work required to fix once spotted

Closest to 'simple parameterised fix' (e3), since the quick_fix is rewriting an alternation pattern to use (?|alt1|alt2) and simplifying the match-extraction code — a localised pattern swap, slightly more than one line because the caller's branch-disambiguation logic also collapses.

b2 Burden Structural debt — long-term weight of choosing wrong

Closest to 'minimal commitment' (b1), bumped slightly toward b3 only where used: it's a localised regex idiom, not load-bearing, but it does lock the codebase to PCRE-compatible engines per common_mistakes about portability.

t6 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap' (t7), grounded in the misconception that branch reset creates duplicate/ambiguous captures and in common_mistakes — group numbering after the block doesn't restart, branches with unequal capture counts cause off-by-one, and (?| is easily confused with the visually similar non-capturing (?:).

About DEBT scoring →

Also Known As

branch reset group (?| group PCRE branch reset shared capture groups

TL;DR

A PCRE-specific construct (?|...) that resets capture group numbers across each alternative, so all branches share the same group indices.

Explanation

Branch reset groups (?|...) are a PCRE extension that resets capture group numbering at the start of each alternative within the group. Without branch reset, alternation (a)|(b)|(c) creates groups 1, 2, and 3; with branch reset (?|(a)|(b)|(c)), each alternative's first capture becomes group 1, second becomes group 2, etc. This allows different branches to populate the same group slots, simplifying extraction when the same logical data appears in different formats. Supported in PCRE (PHP, Perl, PCRE2 library) and the third-party Python 'regex' module (not the standard 're' module), but not in JavaScript, Python's standard library, or most other engines. (Note: the third-party Python 'regex' module available on PyPI does support branch reset; the standard library 're' module does not.)

Common Misconception

Branch reset groups create duplicate or ambiguous captures - in fact only one branch can ever match, so the shared group index always contains exactly one value, never a conflict.

Why It Matters

Without branch reset, matching the same logical field across multiple format variants requires either complex post-match logic to check which group is populated or repeated named groups - branch reset eliminates that complexity and keeps result extraction consistent.

Common Mistakes

  • Using branch reset in engines that do not support it (JavaScript, Python re, POSIX) - the (?| syntax will either throw an error or be misinterpreted.
  • Assuming groups after the branch reset block continue numbering from 1 - they continue from the highest group number used inside the branch reset.
  • Writing branches with different numbers of capturing groups inside a branch reset - the extra groups from the longer branch still occupy indices, which can cause off-by-one errors in the outer pattern.
  • Confusing (?| with non-capturing groups (?:) - (?| does capture, it just resets the numbering; (?:) captures nothing.
  • Testing branch reset patterns in online tools that use JavaScript regex engines (like some regex testers) and concluding it does not work.
  • Confusing (?|...) with (?:...) (non-capturing alternation) - (?:) groups the alternatives but does not capture anything and does not reset group numbering; (?|) resets numbering but still requires explicit capturing parentheses inside each branch to actually capture.

Avoid When

  • The target engine is not PCRE - JavaScript (native), Python re, and POSIX grep do not support (?|...) and will fail or misbehave.
  • Branches have significantly different numbers of internal groups, making the shared numbering confusing and error-prone.
  • The pattern will be maintained by developers unfamiliar with PCRE extensions - (?| is obscure enough to warrant a comment or prefer named groups instead.
  • Portability across regex flavors is required - use standard alternation with named groups and the DUPNAMES option as a more portable alternative.

When To Use

  • Matching the same logical concept expressed in multiple formats (dates, phone numbers, identifiers) where you always want results in the same group indices.
  • Simplifying post-match extraction logic that currently checks which of several groups is non-empty to determine which branch matched.
  • Combining with named groups when working in PCRE to allow the same name to appear in multiple branches without a DUPNAMES flag.
  • Writing PCRE patterns in PHP, Perl, or other PCRE-based tools where portability to other engines is not a requirement.

Code Examples

✗ Vulnerable
<?php
// Without branch reset: groups 1-3 for ISO format, groups 4-6 for EU format
// Caller must check which group is set
$pattern = '/(?:(\d{4})-(\d{2})-(\d{2})|(\d{2})\/(\d{2})\/(\d{4}))/';
if (preg_match($pattern, $input, $m)) {
    // Is it ISO or EU? Must check manually
    $year  = $m[1] !== '' ? $m[1] : $m[6]; // fragile
    $month = $m[2] !== '' ? $m[2] : $m[5];
    $day   = $m[3] !== '' ? $m[3] : $m[4];
}
✓ Fixed
<?php
// With branch reset: both branches share groups 1, 2, 3
// ISO: group1=year, group2=month, group3=day
// EU:  group1=day,  group2=month, group3=year
// Use named groups inside the branch reset for unambiguous semantics:
$pattern = '/(?|(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})|(?P<day>\d{2})\/(?P<month>\d{2})\/(?P<year>\d{4}))/';
if (preg_match($pattern, $input, $m)) {
    // Named groups work regardless of which branch matched
    echo "{$m['year']}-{$m['month']}-{$m['day']}\n";
}

// Numeric-only version — branches must agree on group semantics:
// Here both branches deliberately put the same fields in the same slot
// by choosing a common order, e.g. always (day, month, year):
$pattern2 = '/(?|(\d{2})-(\d{2})-(\d{4})|(\d{2})\/(\d{2})\/(\d{4}))/';
if (preg_match($pattern2, $input, $m)) {
    [$full, $day, $month, $year] = $m; // group indices 1,2,3 always hold day,month,year
    echo "$year-$month-$day\n";
}

Added 7 May 2026
Views 45
AI edit PF Media Bot Claude Opus 4.5 on long · 7 May 2026
Edits history 1 edit
  1. long PF Media Bot Claude Opus 4.5 · 7 May 2026
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 1 ping M 0 pings T 0 pings W 1 ping T 0 pings F 0 pings S 0 pings S 0 pings M 1 ping T 0 pings W 0 pings T 0 pings F 1 ping S 1 ping S 0 pings M 0 pings T 1 ping W 1 ping T 1 ping F 0 pings S 2 pings S 1 ping M 0 pings T 0 pings W 0 pings T
No pings yet today
No pings yesterday
Perplexity 5 Scrapy 4 Google 3 Ahrefs 2 SEMrush 2 Meta AI 1 Bing 1
crawler 18
DEV INTEL Tools & Severity
🔵 Info ⚙ Fix effort: Medium
⚡ Quick Fix
Replace separate alternation groups with (?|alt1|alt2) so all branches share the same group number, then read group 1 (or the named group) regardless of which branch matched.
📦 Applies To
pcre web cli queue-worker library
🔗 Prerequisites
🔍 Detection Hints
preg_match or preg_match_all calls where the $matches array is subsequently accessed with numeric indices in a conditional expression such as `$m[N] !== ''`, `isset($m[N])`, or a ternary choosing between two different numeric offsets — indicating the caller is manually disambiguating which alternation branch matched.
Auto-detectable: ✗ No regex101
🤖 AI Agent
Confidence: Medium False Positives: Low ✗ Manual fix Fix: Medium Context: Function Tests: Update

✓ schema.org compliant