Capture Groups & Backreferences
debt(d7/e2/b3/t5)
Closest to 'only careful code review or runtime testing' (d7), no detection_hints.tools provided; mistakes like missing return-value check or wrong group index are not caught by default linters and typically surface in testing or review. Static analysis tools (Psalm, PHPStan) generally don't validate regex group semantics.
Closest to 'one-line patch' (e1) but slightly above because fix may involve adding a preg_match return check plus adjusting group references; quick_fix describes one-line swaps like (?:...) for (...) or adding named groups.
Closest to 'localised tax' (b3), regex patterns are typically confined to specific parsing/validation utilities; applies_to web/cli but the choice doesn't shape system architecture, only the local extraction code.
Closest to 'notable trap' (t5), the misconception that groups are free plus the $matches[0] vs [1] off-by-one is a documented gotcha most PHP devs eventually learn; matches expectations of a well-known regex pitfall.
Also Known As
TL;DR
Explanation
A capture group (...) stores the text matched by its subpattern in a numbered slot ($1, $2, or \1, \2 in replacement strings). Non-capturing groups (?:...) group without storing. In PHP, preg_match fills the $matches array: $matches[0] is the full match, $matches[1] is group 1, and so on. preg_replace_callback and preg_replace support $1/$2 backreferences in the replacement string. Backreferences in the pattern itself (\1) match the same text that group 1 matched — useful for finding repeated words or balanced delimiters. Named capture groups (?P<name>...) or (?<name>...) improve readability and allow extraction by name via $matches['name'].
Common Misconception
Why It Matters
Common Mistakes
- Using $matches[1] when the match failed — always check the return value of preg_match before accessing $matches.
- Forgetting that $matches[0] is the full match, not group 1 — off-by-one errors in group indexing are extremely common.
- Using capturing groups inside preg_split — groups in the delimiter pattern are included in the split results, which is rarely intended.
- Not using non-capturing groups (?:...) when grouping for alternation or quantification without needing to capture.
Code Examples
// Numbered groups — fragile if pattern changes
preg_match('/(\d{4})-(\d{2})-(\d{2})/', $date, $m);
$year = $m[1]; // breaks if you add a group before year
$month = $m[2];
$day = $m[3];
// Named groups — robust to pattern changes
preg_match('/(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})/', $date, $m);
$year = $m['year'];
$month = $m['month'];
$day = $m['day'];