Regex Conditional Patterns
debt(d8/e3/b3/t7)
Closest to 'silent in production until users hit it' (d9), but eased to d8 because detection_hints lists regex101 as a tool for inspection; however automated is 'no' and there's no linter, so misbehaving conditionals (matching empty unexpectedly, wrong group reference) stay silent until adversarial or edge-case input hits production.
Closest to 'simple parameterised fix' (e3), the quick_fix replaces duplicated parallel alternatives with (?(group)yes|no), a contained pattern-level rewrite within one regex string, then testing both branches.
Closest to 'localised tax' (b3), the choice lives inside individual regex patterns; applies_to spans all PHP contexts but each conditional pattern is isolated to one matching site, not load-bearing across the system.
Closest to 'serious trap' (t7), the misconception is that (?(1)...) backreferences group 1's captured text when it only tests participation; this contradicts how backreferences behave elsewhere and is compounded by silent group-renumbering and the no-pattern branch matching empty.
Also Known As
TL;DR
Explanation
Conditional patterns let a regex choose between two alternatives based on whether a capturing group, named group, or lookaround assertion has already matched. The syntax in PCRE (and therefore PHP's preg_* functions) is (?(condition)yes-pattern|no-pattern). The condition is usually a group reference: (?(1)...) tests whether capture group 1 participated in the match, and (?(<name>)...) or (?(name)...) tests a named group. The no-pattern branch is optional, so (?(1)yes-pattern) matches yes-pattern only if group 1 succeeded and matches empty otherwise. Conditions can also be assertions: (?(?=lookahead)yes|no) selects a branch based on a zero-width test rather than a prior capture.
The classic use case is balancing optional delimiters. Consider matching an optionally bracketed value: (\()?\d+(?(1)\)) matches '42' or '(42)' but rejects '(42' and '42)' because the closing paren is required if and only if the opening paren matched. Without a conditional you would write two separate alternatives and risk them drifting apart. Conditionals keep the dependency explicit and in one place.
These constructs are powerful but easy to misread. The group number inside (?(1)...) refers to the capture index, not a backreference to the captured text - that is a common source of confusion. Named conditions improve readability: (?(quote)...) is clearer than (?(2)...) when groups are renumbered during edits. Conditionals are supported by PCRE, .NET, and Python's re module, but not by JavaScript's native RegExp engine, so patterns relying on them are not portable to browser code without a library.
Because conditionals branch on prior match state, they can interact with backtracking in surprising ways. Keep the yes and no branches as specific as possible and anchor where you can to avoid catastrophic backtracking. Always test conditional patterns against inputs that exercise both branches plus the edge case where the condition group is optional and absent.
Common Misconception
Why It Matters
Common Mistakes
- Confusing the group reference in (?(1)...) with a backreference to the captured text rather than a participation test.
- Relying on conditional patterns in JavaScript, whose native RegExp engine does not support them.
- Renumbering groups during edits so (?(2)...) silently points at the wrong condition group.
- Forgetting the no-pattern branch is optional, then matching empty unexpectedly when the condition fails.
- Writing loose yes/no branches that trigger catastrophic backtracking on adversarial input.
Avoid When
- Targeting JavaScript's native RegExp engine, which does not support conditional subpatterns.
- A simple alternation or optional group expresses the same rule with clearer intent.
- The pattern is maintained by people unfamiliar with PCRE conditionals, raising the risk of misedits.
When To Use
- An element must appear if and only if an earlier optional group matched, such as balanced delimiters.
- You need to avoid duplicating a shared subpattern across two parallel alternatives.
- The condition is naturally expressed as 'did this group participate' rather than a value comparison.
Code Examples
// Two parallel alternatives that can drift apart and accept unbalanced input:
$pattern = '/^(?:\(\d+\)|\d+)$/';
var_dump(preg_match($pattern, '(42)')); // 1 ok
var_dump(preg_match($pattern, '42')); // 1 ok
// But maintaining two copies of the \d+ body invites mismatched edits,
// and a naive single alternation accepts unbalanced forms:
$loose = '/^\(?\d+\)?$/';
var_dump(preg_match($loose, '(42')); // 1 - WRONG, missing close paren accepted
var_dump(preg_match($loose, '42)')); // 1 - WRONG, stray close paren accepted
// Conditional ties the closing paren to whether the opening one matched:
$pattern = '/^(\()?\d+(?(1)\))$/';
var_dump(preg_match($pattern, '(42)')); // 1 - both parens present
var_dump(preg_match($pattern, '42')); // 1 - neither paren
var_dump(preg_match($pattern, '(42')); // 0 - open without close rejected
var_dump(preg_match($pattern, '42)')); // 0 - close without open rejected
// Named condition reads better and survives group renumbering:
$named = '/^(?<open>\()?\d+(?(open)\))$/';
var_dump(preg_match($named, '(42)')); // 1