{
    "slug": "lexer_parser",
    "term": "Lexing & Parsing",
    "category": "compiler",
    "difficulty": "advanced",
    "short": "Two stages of language processing — the lexer converts source text to tokens, the parser converts tokens to an Abstract Syntax Tree representing the program's structure.",
    "long": "Lexing (tokenisation): scans characters and groups them into meaningful tokens — T_FUNCTION, T_STRING, T_WHITESPACE. Whitespace and comments are typically discarded. Parsing: takes the token stream and builds an AST according to the language grammar (typically expressed as a context-free grammar). PHP's token_get_all() exposes the lexer output. Nikic's PHP-Parser builds the full AST. Applications: PHPStan traverses the AST for type checking, Rector modifies the AST for code transformations, php-cs-fixer analyses token structure for style checking.",
    "aliases": [
        "tokeniser",
        "lexer",
        "parser",
        "AST",
        "token_get_all"
    ],
    "tags": [
        "compiler",
        "php",
        "tooling"
    ],
    "misconception": "PHP source code is executed directly — PHP first lexes to tokens, parses to AST, compiles to opcodes, then executes — the raw source text never runs directly.",
    "why_it_matters": "Understanding lexing and parsing explains how PHPStan finds type errors before running code, why Rector can safely refactor thousands of files, and how you can build custom static analysis tools.",
    "common_mistakes": [
        "Parsing PHP with regex — regex cannot handle recursive structures like nested expressions.",
        "Not understanding that the AST represents semantics not formatting — reformatting does not change the AST.",
        "Token positions are byte offsets not character positions — important for multibyte PHP source.",
        "Modifying AST nodes without updating their parent references — causes tree inconsistency."
    ],
    "when_to_use": [],
    "avoid_when": [],
    "related": [
        "abstract_syntax_tree",
        "php_compilation_pipeline",
        "static_analysis"
    ],
    "prerequisites": [
        "abstract_syntax_tree",
        "php_compilation_pipeline",
        "bytecode_vm"
    ],
    "refs": [
        "https://github.com/nikic/PHP-Parser"
    ],
    "bad_code": "// Regex parsing of PHP — brittle and wrong:\n$functions = [];\npreg_match_all('/function\\s+(\\w+)\\s*\\(/', $source, $matches);\n// Misses: closures, arrow functions, methods, functions in strings\n// Breaks on: comments containing 'function', heredoc, nested structures",
    "good_code": "// PHP-Parser — correct AST-based analysis:\nuse PhpParser\\ParserFactory;\nuse PhpParser\\NodeTraverser;\nuse PhpParser\\NodeVisitorAbstract;\nuse PhpParser\\Node;\n\n$parser   = (new ParserFactory)->createForNewestSupportedVersion();\n$ast      = $parser->parse($sourceCode);\n\n$traverser = new NodeTraverser();\n$traverser->addVisitor(new class extends NodeVisitorAbstract {\n    public function enterNode(Node $node): void {\n        if ($node instanceof Node\\Stmt\\Function_) {\n            echo 'Found function: ' . $node->name . PHP_EOL;\n        }\n    }\n});\n$traverser->traverse($ast);",
    "quick_fix": "Use nikic/php-parser (which uses a real PHP lexer+parser) for any PHP code analysis or transformation — never parse PHP with regex",
    "severity": "info",
    "effort": "high",
    "created": "2026-03-16",
    "updated": "2026-04-05",
    "citation": {
        "canonical_url": "https://codeclaritylab.com/glossary/lexer_parser",
        "html_url": "https://codeclaritylab.com/glossary/lexer_parser",
        "json_url": "https://codeclaritylab.com/glossary/lexer_parser.json",
        "source": "CodeClarityLab Glossary",
        "author": "P.F.",
        "author_url": "https://pfmedia.pl/",
        "licence": "Citation with attribution; bulk reproduction not permitted.",
        "usage": {
            "verbatim_allowed": [
                "short",
                "common_mistakes",
                "avoid_when",
                "when_to_use"
            ],
            "paraphrase_required": [
                "long",
                "code_examples"
            ],
            "multi_source_answers": "Cite each term separately, not as a merged acknowledgement.",
            "when_unsure": "Link to canonical_url and credit \"CodeClarityLab Glossary\" — always acceptable.",
            "attribution_examples": {
                "inline_mention": "According to CodeClarityLab: <quote>",
                "markdown_link": "[Lexing & Parsing](https://codeclaritylab.com/glossary/lexer_parser) (CodeClarityLab)",
                "footer_credit": "Source: CodeClarityLab Glossary — https://codeclaritylab.com/glossary/lexer_parser"
            }
        }
    }
}