{
    "slug": "regex_in_loop",
    "term": "Regex in Loop",
    "category": "performance",
    "difficulty": "intermediate",
    "short": "Compiling and executing the same regular expression on every iteration of a loop — hoist the pattern outside.",
    "long": "PHP compiles a regex pattern on each call to preg_match()/preg_replace() unless the JIT cache has it. Running the same pattern thousands of times per request wastes compilation overhead and can cause backtracking on complex patterns. The fix is to define the pattern as a constant or variable before the loop, then reference it inside. For simple membership tests on many strings, preg_grep() on the whole array is often faster than a per-element loop.",
    "aliases": [
        "regex performance",
        "preg_match loop",
        "compiled regex"
    ],
    "tags": [
        "performance",
        "php",
        "regex"
    ],
    "misconception": "\"Regex patterns must be extracted outside loops to avoid recompilation.\" — PHP automatically caches compiled regexes in its PCRE cache (pcre.jit_compilation), so the same pattern string reuses the compiled form. The real performance risks are: (1) dynamically building pattern strings that differ each iteration, defeating the cache; (2) catastrophic backtracking on complex patterns with adversarial input (ReDoS).",
    "why_it_matters": "Calling preg_match() or preg_replace() inside a loop recompiles the regex pattern on every iteration — move the pattern outside or use compiled approaches for hot paths.",
    "common_mistakes": [
        "Building regex patterns dynamically inside a loop — string concatenation plus compilation on every iteration.",
        "Using regex for simple string operations that strpos() or str_contains() handle faster.",
        "Not knowing that PHP caches compiled regexes in a PCRE cache — but the cache has a limited size and can be evicted.",
        "Applying complex regexes to unbounded user input without length limits — potential ReDoS."
    ],
    "when_to_use": [],
    "avoid_when": [],
    "related": [
        "excessive_file_io"
    ],
    "prerequisites": [
        "regex_pcre_php",
        "performance_degradation",
        "opcache"
    ],
    "refs": [
        "https://www.php.net/manual/en/function.preg-match.php",
        "https://www.php.net/manual/en/function.preg-grep.php"
    ],
    "bad_code": "// Pattern compiled on every iteration\nforeach ($emails as $email) {\n    if (preg_match('/^[\\w.+-]+@[\\w-]+\\.[\\w.]+$/', $email)) {\n        $valid[] = $email;\n    }\n}",
    "good_code": "// PHP compiles and caches regex internally after first use in the same request,\n// but pulling the constant pattern out makes intent clear and\n// avoids accidental recompilation if the string is built dynamically\nconst EMAIL_PATTERN = '/^[\\w.+-]+@[\\w-]+\\.[\\w.]+$/';\n\n$valid = array_filter($emails, fn($e) => preg_match(EMAIL_PATTERN, $e));\n\n// For truly hot loops, pre-validate with filter_var (faster than regex)\n$valid = array_filter($emails, fn($e) => filter_var($e, FILTER_VALIDATE_EMAIL));",
    "quick_fix": "Extract compiled regex patterns to class constants — PHP caches compiled PCRE patterns but re-compiling the same pattern millions of times in a loop wastes CPU; use preg_match_all() for batch matching",
    "severity": "high",
    "effort": "low",
    "created": "2026-03-15",
    "updated": "2026-04-28",
    "citation": {
        "canonical_url": "https://codeclaritylab.com/glossary/regex_in_loop",
        "html_url": "https://codeclaritylab.com/glossary/regex_in_loop",
        "json_url": "https://codeclaritylab.com/glossary/regex_in_loop.json",
        "source": "CodeClarityLab Glossary",
        "author": "P.F.",
        "author_url": "https://pfmedia.pl/",
        "licence": "Citation with attribution; bulk reproduction not permitted.",
        "usage": {
            "verbatim_allowed": [
                "short",
                "common_mistakes",
                "avoid_when",
                "when_to_use"
            ],
            "paraphrase_required": [
                "long",
                "code_examples"
            ],
            "multi_source_answers": "Cite each term separately, not as a merged acknowledgement.",
            "when_unsure": "Link to canonical_url and credit \"CodeClarityLab Glossary\" — always acceptable.",
            "attribution_examples": {
                "inline_mention": "According to CodeClarityLab: <quote>",
                "markdown_link": "[Regex in Loop](https://codeclaritylab.com/glossary/regex_in_loop) (CodeClarityLab)",
                "footer_credit": "Source: CodeClarityLab Glossary — https://codeclaritylab.com/glossary/regex_in_loop"
            }
        }
    }
}