Linux Performance Tools
debt(d7/e3/b3/t5)
Closest to 'only careful code review or runtime testing' (d7). The detection_hints list htop, iotop, nethogs, vmstat, perf, sar as tools, but the specific misuse pattern — ignoring IO wait, missing disk saturation, skipping strace for hung processes — only surfaces during careful operational review or when users report slowness. The tools themselves are detectable but knowing which tool is missing from your analysis requires human judgment during an incident or code review of runbooks.
Closest to 'simple parameterised fix' (e3). The quick_fix describes a straightforward substitution pattern: swap or add htop, iotop, nethogs, vmstat as appropriate. This is more than a one-line patch (you need to identify the bottleneck type and apply the correct tool combination) but is well within a single session of work without touching multiple files or architectural concerns.
Closest to 'localised tax' (b3). The applies_to contexts are web and cli, and the tags indicate devops/debugging scope. The burden is primarily on operators and on-call engineers who must know the toolset, but it doesn't impose a persistent codebase-wide tax. It affects diagnostic and runbook practice rather than every future code change.
Closest to 'notable trap' (t5). The misconception is explicitly documented: developers assume top is sufficient, but it misses IO saturation, network bottlenecks, and blocking syscalls. The common_mistakes reinforce this — %iowait being confused with disk utilisation is a well-known documented gotcha that most developers eventually learn, placing this squarely at the 'documented gotcha most devs eventually learn' anchor.
Also Known As
TL;DR
Explanation
CPU: top / htop (real-time process stats), mpstat (per-CPU stats), perf top (CPU profiling by function). Memory: free -h (overview), vmstat -s (detailed), smem (per-process). Disk IO: iostat -x (saturation and utilisation), iotop (per-process IO). Network: nethogs (per-process bandwidth), iftop (per-connection traffic), ss -s (socket summary). Tracing: strace -p PID (system calls made by a process — what is it doing?), ltrace (library calls). For PHP: strace on a stuck PHP-FPM worker shows exactly which system call it's blocked on.
Common Misconception
Why It Matters
Common Mistakes
- Only looking at CPU usage — IO wait shows as CPU idle but means processes are blocked on disk.
- Not knowing strace -p for live process inspection — invaluable for hung process diagnosis.
- Running strace in production on a busy process — overhead can be 10-100x; use carefully.
- Confusing %iowait (CPU waiting for IO) with disk utilisation — %iowait doesn't directly show disk saturation.
Code Examples
# Performance analysis by guesswork:
# 'The server is slow'
# Restart PHP-FPM (sometimes works, doesn't fix root cause)
# Check top: CPU 30% — nothing obvious
# Give up, ticket to sysadmin
# Hours wasted
# Systematic performance investigation:
# 1. CPU and load:
htop # Real-time, with color
mpstat 1 5 # Per-CPU every second for 5 seconds
# 2. Memory:
free -h # Quick overview
vmstat 1 5 # Including swap activity
# 3. Disk IO:
iostat -x 1 5 # Is any disk at 100% util?
iotop # Which process is using IO?
# 4. What is a stuck PHP-FPM worker doing?
strace -p $(pgrep php-fpm | head -1) -e trace=network,file -T
# Shows: connect() to DB taking 3.5 seconds — DB is the bottleneck!