{
    "slug": "linux_stat_syscall",
    "term": "stat() System Call",
    "category": "linux",
    "difficulty": "intermediate",
    "short": "The stat family of syscalls retrieves file metadata — size, permissions, timestamps, owner, inode — without reading the file's contents.",
    "long": "stat() and its relatives (lstat(), fstat(), and the modern statx()) ask the kernel for a file's metadata: its size in bytes, its permission bits and type (st_mode), owner and group IDs (st_uid/st_gid), link count, inode number (st_ino), device ID, and three timestamps — access (atime), modification (mtime), and inode change (ctime). None of these calls open or read file data; they only return the struct stored in the file's inode. The difference between the variants matters: stat() follows symlinks and reports the target, lstat() reports the symlink itself, and fstat() works on an already-open file descriptor. statx() (Linux 4.11+) adds birth time and a request mask so the kernel can skip expensive fields on network filesystems. In PHP, the stat() / lstat() / fstat() functions and the convenience wrappers filesize(), filemtime(), is_dir(), and is_file() all sit on top of these syscalls. PHP caches the result of the most recent stat call per path; functions like clearstatcache() exist precisely because that cache can return stale metadata after a file changes. Tools like 'ls -l', 'find', 'du', and 'rsync' lean heavily on stat to decide what to display, match, or skip. Understanding stat helps you reason about performance — a directory listing of 100,000 files is 100,000 stat calls — and about correctness, since checking metadata is not the same as checking content, and a stat result can be invalidated by another process the instant after it returns.",
    "aliases": [
        "stat",
        "lstat",
        "fstat",
        "statx"
    ],
    "tags": [
        "linux",
        "syscall",
        "filesystem",
        "inode",
        "metadata",
        "file-descriptors"
    ],
    "misconception": "Many assume stat() opens or reads the file — it does not; it only reads the inode metadata, so it succeeds even on files you cannot read the contents of, as long as you have execute/search permission on the parent directories.",
    "why_it_matters": "Metadata lookups dominate the cost of directory scans, builds, and sync tools, and a stale or misinterpreted stat result causes TOCTOU race bugs and incorrect cache invalidation in real applications.",
    "common_mistakes": [
        "Using stat() instead of lstat() when inspecting symlinks, so you read the target's metadata and miss broken or malicious links.",
        "Relying on PHP's stat cache and getting stale filesize()/filemtime() values after the file changed without calling clearstatcache().",
        "Treating a successful stat() as proof of readability — the file may exist but be unreadable, or change between the check and the open (TOCTOU).",
        "Confusing ctime with file creation time — ctime is the inode change time; only statx() exposes true birth time (btime).",
        "Doing one stat() per file in a tight loop over huge directories instead of using readdir with d_type or batching, causing severe I/O overhead."
    ],
    "when_to_use": [
        "You need size, timestamps, ownership, permissions, or inode number without opening or reading the file.",
        "You must distinguish a symlink from its target — use lstat().",
        "You already hold an open file descriptor and want race-free metadata — use fstat()."
    ],
    "avoid_when": [
        "You need the file's actual contents — stat returns metadata only, so use read/file_get_contents instead.",
        "You are scanning millions of entries and only need names or types — prefer readdir with d_type to avoid a stat per entry."
    ],
    "related": [
        "linux_file_system",
        "linux_file_permissions",
        "file_descriptors",
        "memory_mapped_files",
        "linux_processes"
    ],
    "prerequisites": [
        "linux_file_system",
        "linux_file_permissions",
        "file_descriptors"
    ],
    "refs": [
        "https://man7.org/linux/man-pages/man2/stat.2.html",
        "https://man7.org/linux/man-pages/man2/statx.2.html",
        "https://www.php.net/manual/en/function.stat.php"
    ],
    "bad_code": "<?php\n// Reads target metadata, not the symlink itself; cache may be stale\n$size = filesize('/var/cache/report.json');\n// ... another process truncates the file here ...\nif ($size > 0) {\n    // Stale: $size came from PHP's per-path stat cache\n    $data = file_get_contents('/var/cache/report.json');\n    process($data); // may be empty now (TOCTOU)\n}\n",
    "good_code": "<?php\n// Inspect the link itself, refresh the cache, work on an open handle\n$path = '/var/cache/report.json';\nclearstatcache(true, $path);\n\n$meta = lstat($path);\nif ($meta === false) {\n    throw new RuntimeException(\"cannot stat $path\");\n}\nif (($meta['mode'] & 0xF000) === 0xA000) {\n    throw new RuntimeException('refusing to follow symlink');\n}\n\n// Note: a symlink could still be swapped in between lstat() and fopen().\n// The real guard is operating on the open handle itself via fstat().\n$fh = fopen($path, 'rb');\n$fstat = fstat($fh); // metadata of the exact handle we will read\nif ($fstat['size'] > 0) {\n    $data = stream_get_contents($fh);\n    process($data);\n}\nfclose($fh);\n",
    "quick_fix": "Use lstat() for symlinks, fstat() on open descriptors, and call clearstatcache() before re-reading PHP file metadata that may have changed.",
    "severity": "info",
    "effort": "low",
    "created": "2026-06-18",
    "updated": "2026-06-18",
    "citation": {
        "canonical_url": "https://codeclaritylab.com/glossary/linux_stat_syscall",
        "html_url": "https://codeclaritylab.com/glossary/linux_stat_syscall",
        "json_url": "https://codeclaritylab.com/glossary/linux_stat_syscall.json",
        "source": "CodeClarityLab Glossary",
        "author": "P.F.",
        "author_url": "https://pfmedia.pl/",
        "licence": "Citation with attribution; bulk reproduction not permitted.",
        "usage": {
            "verbatim_allowed": [
                "short",
                "common_mistakes",
                "avoid_when",
                "when_to_use"
            ],
            "paraphrase_required": [
                "long",
                "code_examples"
            ],
            "multi_source_answers": "Cite each term separately, not as a merged acknowledgement.",
            "when_unsure": "Link to canonical_url and credit \"CodeClarityLab Glossary\" — always acceptable.",
            "attribution_examples": {
                "inline_mention": "According to CodeClarityLab: <quote>",
                "markdown_link": "[stat() System Call](https://codeclaritylab.com/glossary/linux_stat_syscall) (CodeClarityLab)",
                "footer_credit": "Source: CodeClarityLab Glossary — https://codeclaritylab.com/glossary/linux_stat_syscall"
            }
        }
    }
}