← CodeClarityLab Home
Browse by Category
+ added · updated 7d
← Back to glossary

Lexing & Parsing

compiler PHP 7.0+ Advanced

Also Known As

tokeniser lexer parser AST token_get_all

TL;DR

Two stages of language processing — the lexer converts source text to tokens, the parser converts tokens to an Abstract Syntax Tree representing the program's structure.

Explanation

Lexing (tokenisation): scans characters and groups them into meaningful tokens — T_FUNCTION, T_STRING, T_WHITESPACE. Whitespace and comments are typically discarded. Parsing: takes the token stream and builds an AST according to the language grammar (typically expressed as a context-free grammar). PHP's token_get_all() exposes the lexer output. Nikic's PHP-Parser builds the full AST. Applications: PHPStan traverses the AST for type checking, Rector modifies the AST for code transformations, php-cs-fixer analyses token structure for style checking.

Common Misconception

PHP source code is executed directly — PHP first lexes to tokens, parses to AST, compiles to opcodes, then executes — the raw source text never runs directly.

Why It Matters

Understanding lexing and parsing explains how PHPStan finds type errors before running code, why Rector can safely refactor thousands of files, and how you can build custom static analysis tools.

Common Mistakes

  • Parsing PHP with regex — regex cannot handle recursive structures like nested expressions.
  • Not understanding that the AST represents semantics not formatting — reformatting does not change the AST.
  • Token positions are byte offsets not character positions — important for multibyte PHP source.
  • Modifying AST nodes without updating their parent references — causes tree inconsistency.

Code Examples

✗ Vulnerable
// Regex parsing of PHP — brittle and wrong:
$functions = [];
preg_match_all('/function\s+(\w+)\s*\(/', $source, $matches);
// Misses: closures, arrow functions, methods, functions in strings
// Breaks on: comments containing 'function', heredoc, nested structures
✓ Fixed
// PHP-Parser — correct AST-based analysis:
use PhpParser\ParserFactory;
use PhpParser\NodeTraverser;
use PhpParser\NodeVisitorAbstract;
use PhpParser\Node;

$parser   = (new ParserFactory)->createForNewestSupportedVersion();
$ast      = $parser->parse($sourceCode);

$traverser = new NodeTraverser();
$traverser->addVisitor(new class extends NodeVisitorAbstract {
    public function enterNode(Node $node): void {
        if ($node instanceof Node\Stmt\Function_) {
            echo 'Found function: ' . $node->name . PHP_EOL;
        }
    }
});
$traverser->traverse($ast);

Added 16 Mar 2026
Edited 5 Apr 2026
Views 25
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 1 ping M 0 pings T 0 pings W 0 pings T 1 ping F 0 pings S 1 ping S 1 ping M 0 pings T 0 pings W 1 ping T 2 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 1 ping F 0 pings S 0 pings S 1 ping M 0 pings T 0 pings W 0 pings T
No pings yet today
No pings yesterday
Perplexity 8 Amazonbot 8 Google 4 Unknown AI 2 Majestic 1 Ahrefs 1
crawler 20 crawler_json 3 pre-tracking 1
DEV INTEL Tools & Severity
🔵 Info ⚙ Fix effort: High
⚡ Quick Fix
Use nikic/php-parser (which uses a real PHP lexer+parser) for any PHP code analysis or transformation — never parse PHP with regex
📦 Applies To
PHP 7.0+ any cli
🔗 Prerequisites
🔍 Detection Hints
Regex used to parse PHP source code; custom token parsing instead of PHP-Parser AST
Auto-detectable: ✗ No php-parser phpstan rector
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: High ✗ Manual fix Fix: High Context: File

✓ schema.org compliant