Lexing & Parsing
Also Known As
tokeniser
lexer
parser
AST
token_get_all
TL;DR
Two stages of language processing — the lexer converts source text to tokens, the parser converts tokens to an Abstract Syntax Tree representing the program's structure.
Explanation
Lexing (tokenisation): scans characters and groups them into meaningful tokens — T_FUNCTION, T_STRING, T_WHITESPACE. Whitespace and comments are typically discarded. Parsing: takes the token stream and builds an AST according to the language grammar (typically expressed as a context-free grammar). PHP's token_get_all() exposes the lexer output. Nikic's PHP-Parser builds the full AST. Applications: PHPStan traverses the AST for type checking, Rector modifies the AST for code transformations, php-cs-fixer analyses token structure for style checking.
Common Misconception
✗ PHP source code is executed directly — PHP first lexes to tokens, parses to AST, compiles to opcodes, then executes — the raw source text never runs directly.
Why It Matters
Understanding lexing and parsing explains how PHPStan finds type errors before running code, why Rector can safely refactor thousands of files, and how you can build custom static analysis tools.
Common Mistakes
- Parsing PHP with regex — regex cannot handle recursive structures like nested expressions.
- Not understanding that the AST represents semantics not formatting — reformatting does not change the AST.
- Token positions are byte offsets not character positions — important for multibyte PHP source.
- Modifying AST nodes without updating their parent references — causes tree inconsistency.
Code Examples
✗ Vulnerable
// Regex parsing of PHP — brittle and wrong:
$functions = [];
preg_match_all('/function\s+(\w+)\s*\(/', $source, $matches);
// Misses: closures, arrow functions, methods, functions in strings
// Breaks on: comments containing 'function', heredoc, nested structures
✓ Fixed
// PHP-Parser — correct AST-based analysis:
use PhpParser\ParserFactory;
use PhpParser\NodeTraverser;
use PhpParser\NodeVisitorAbstract;
use PhpParser\Node;
$parser = (new ParserFactory)->createForNewestSupportedVersion();
$ast = $parser->parse($sourceCode);
$traverser = new NodeTraverser();
$traverser->addVisitor(new class extends NodeVisitorAbstract {
public function enterNode(Node $node): void {
if ($node instanceof Node\Stmt\Function_) {
echo 'Found function: ' . $node->name . PHP_EOL;
}
}
});
$traverser->traverse($ast);
References
Tags
🤝 Adopt this term
£79/year · your link shown here
Added
16 Mar 2026
Edited
5 Apr 2026
Views
25
🤖 AI Guestbook educational data only
|
|
Last 30 days
Agents 0
No pings yet today
No pings yesterday
Perplexity 8
Amazonbot 8
Google 4
Unknown AI 2
Majestic 1
Ahrefs 1
How they use it
crawler 20
crawler_json 3
pre-tracking 1
Related categories
⚡
DEV INTEL
Tools & Severity
🔵 Info
⚙ Fix effort: High
⚡ Quick Fix
Use nikic/php-parser (which uses a real PHP lexer+parser) for any PHP code analysis or transformation — never parse PHP with regex
📦 Applies To
PHP 7.0+
any
cli
🔗 Prerequisites
🔍 Detection Hints
Regex used to parse PHP source code; custom token parsing instead of PHP-Parser AST
Auto-detectable:
✗ No
php-parser
phpstan
rector
⚠ Related Problems
🤖 AI Agent
Confidence: Low
False Positives: High
✗ Manual fix
Fix: High
Context: File