Regex Branch Reset Groups
debt(d7/e3/b2/t6)
Closest to 'only careful code review or runtime testing' (d7), since detection_hints.automated is 'no' and only regex101 is listed — the smell (manual disambiguation of $m[N] across alternation branches) is visible only on review or via runtime regex testing.
Closest to 'simple parameterised fix' (e3), since the quick_fix is rewriting an alternation pattern to use (?|alt1|alt2) and simplifying the match-extraction code — a localised pattern swap, slightly more than one line because the caller's branch-disambiguation logic also collapses.
Closest to 'minimal commitment' (b1), bumped slightly toward b3 only where used: it's a localised regex idiom, not load-bearing, but it does lock the codebase to PCRE-compatible engines per common_mistakes about portability.
Closest to 'serious trap' (t7), grounded in the misconception that branch reset creates duplicate/ambiguous captures and in common_mistakes — group numbering after the block doesn't restart, branches with unequal capture counts cause off-by-one, and (?| is easily confused with the visually similar non-capturing (?:).
Also Known As
TL;DR
Explanation
Branch reset groups (?|...) are a PCRE extension that resets capture group numbering at the start of each alternative within the group. Without branch reset, alternation (a)|(b)|(c) creates groups 1, 2, and 3; with branch reset (?|(a)|(b)|(c)), each alternative's first capture becomes group 1, second becomes group 2, etc. This allows different branches to populate the same group slots, simplifying extraction when the same logical data appears in different formats. Supported in PCRE (PHP, Perl, PCRE2 library) and the third-party Python 'regex' module (not the standard 're' module), but not in JavaScript, Python's standard library, or most other engines. (Note: the third-party Python 'regex' module available on PyPI does support branch reset; the standard library 're' module does not.)
Common Misconception
Why It Matters
Common Mistakes
- Using branch reset in engines that do not support it (JavaScript, Python re, POSIX) - the (?| syntax will either throw an error or be misinterpreted.
- Assuming groups after the branch reset block continue numbering from 1 - they continue from the highest group number used inside the branch reset.
- Writing branches with different numbers of capturing groups inside a branch reset - the extra groups from the longer branch still occupy indices, which can cause off-by-one errors in the outer pattern.
- Confusing (?| with non-capturing groups (?:) - (?| does capture, it just resets the numbering; (?:) captures nothing.
- Testing branch reset patterns in online tools that use JavaScript regex engines (like some regex testers) and concluding it does not work.
- Confusing (?|...) with (?:...) (non-capturing alternation) - (?:) groups the alternatives but does not capture anything and does not reset group numbering; (?|) resets numbering but still requires explicit capturing parentheses inside each branch to actually capture.
Avoid When
- The target engine is not PCRE - JavaScript (native), Python re, and POSIX grep do not support (?|...) and will fail or misbehave.
- Branches have significantly different numbers of internal groups, making the shared numbering confusing and error-prone.
- The pattern will be maintained by developers unfamiliar with PCRE extensions - (?| is obscure enough to warrant a comment or prefer named groups instead.
- Portability across regex flavors is required - use standard alternation with named groups and the DUPNAMES option as a more portable alternative.
When To Use
- Matching the same logical concept expressed in multiple formats (dates, phone numbers, identifiers) where you always want results in the same group indices.
- Simplifying post-match extraction logic that currently checks which of several groups is non-empty to determine which branch matched.
- Combining with named groups when working in PCRE to allow the same name to appear in multiple branches without a DUPNAMES flag.
- Writing PCRE patterns in PHP, Perl, or other PCRE-based tools where portability to other engines is not a requirement.
Code Examples
<?php
// Without branch reset: groups 1-3 for ISO format, groups 4-6 for EU format
// Caller must check which group is set
$pattern = '/(?:(\d{4})-(\d{2})-(\d{2})|(\d{2})\/(\d{2})\/(\d{4}))/';
if (preg_match($pattern, $input, $m)) {
// Is it ISO or EU? Must check manually
$year = $m[1] !== '' ? $m[1] : $m[6]; // fragile
$month = $m[2] !== '' ? $m[2] : $m[5];
$day = $m[3] !== '' ? $m[3] : $m[4];
}
<?php
// With branch reset: both branches share groups 1, 2, 3
// ISO: group1=year, group2=month, group3=day
// EU: group1=day, group2=month, group3=year
// Use named groups inside the branch reset for unambiguous semantics:
$pattern = '/(?|(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})|(?P<day>\d{2})\/(?P<month>\d{2})\/(?P<year>\d{4}))/';
if (preg_match($pattern, $input, $m)) {
// Named groups work regardless of which branch matched
echo "{$m['year']}-{$m['month']}-{$m['day']}\n";
}
// Numeric-only version — branches must agree on group semantics:
// Here both branches deliberately put the same fields in the same slot
// by choosing a common order, e.g. always (day, month, year):
$pattern2 = '/(?|(\d{2})-(\d{2})-(\d{4})|(\d{2})\/(\d{2})\/(\d{4}))/';
if (preg_match($pattern2, $input, $m)) {
[$full, $day, $month, $year] = $m; // group indices 1,2,3 always hold day,month,year
echo "$year-$month-$day\n";
}
References
Tags
Edits history 1 edit
- long PF Media Bot Claude Opus 4.5 · 7 May 2026