Metrics Types
debt(d7/e5/b5/t7)
Closest to 'only careful code review or runtime testing' (d7). The detection_hints note automated=no, and tools listed (prometheus, grafana, opentelemetry) are observability platforms that surface the symptom (misleading dashboards, missing tail latency) only after deployment and under real traffic. No linter or static analysis catches gauge-used-for-counter or average-instead-of-percentile at write time.
Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix describes a conceptual swap (counter vs gauge vs histogram), but in practice correcting metric type misuse means changing instrumentation code across multiple service entry points, updating dashboard queries in Grafana, and potentially migrating stored time-series data. This goes beyond a single-line patch but stops short of a full architectural rework.
Closest to 'persistent productivity tax' (b5). Metrics choices apply across web, cli, and queue-worker contexts per applies_to. Wrong metric types produce misleading dashboards that persist and quietly mislead every engineer doing on-call or capacity planning, creating an ongoing productivity tax across multiple work streams without necessarily reshaping the entire architecture.
Closest to 'serious trap' (t7). The misconception field calls out that averages hide tail latency — a competent developer familiar with averages from statistics will naturally reach for average latency, not realizing it masks the p99 experience. Additionally, gauge vs counter behavior on process restart and summary non-aggregability across workers are non-obvious contradictions relative to how similar concepts (simple numeric accumulators) work in other contexts.
Also Known As
TL;DR
Explanation
Counter: monotonically increasing integer — request count, error count. Never decreases. Use rate() to get per-second rate. Gauge: current value that can go up or down — memory usage, active connections, queue depth. Histogram: samples observations into configurable buckets — request duration, response size. Enables percentile calculation (p95, p99) across multiple instances. Summary: calculates percentiles client-side — accurate but cannot be aggregated across instances. For distributed PHP apps, histograms are almost always preferred over summaries.
Diagram
flowchart TD
subgraph Counter
C[Monotonically increasing<br/>requests_total errors_total]
C_USE["Use rate() for per-second rate<br/>never decreases"]
end
subgraph Gauge
G[Current value up or down<br/>memory_usage active_connections]
G_USE[Snapshot of state right now]
end
subgraph Histogram
H[Samples in buckets<br/>request_duration_seconds]
H_USE[Calculate p50 p95 p99<br/>aggregatable across instances]
end
subgraph Summary
S[Pre-calculated percentiles<br/>accurate but not aggregatable]
end
style C fill:#1f6feb,color:#fff
style G fill:#238636,color:#fff
style H fill:#6e40c9,color:#fff
style S fill:#d29922,color:#fff
Common Misconception
Why It Matters
Common Mistakes
- Using a gauge for a counter (request count) — gauges reset to 0 on restart, losing the total count.
- Average latency instead of p95/p99 histogram — hides the tail experience of slow requests.
- High-cardinality histogram labels — a label per user ID creates millions of time series.
- Summaries in distributed systems — summaries cannot be aggregated across PHP-FPM workers; use histograms.
Code Examples
// Wrong types — gauge for counter, average for latency:
$registry->registerGauge('app', 'requests_total', 'Total requests');
$gauge->set($gauge->get() + 1); // Gauge — resets to 0 on restart, not monotonic
// Average latency — hides tail:
$totalTime += $elapsed;
$avgLatency = $totalTime / $requestCount; // Misleading
// Correct types with prometheus_client_php:
$counter = $registry->registerCounter('app', 'requests_total', 'Total requests', ['route']);
$counter->inc(['route' => '/api/users']); // Monotonically increasing
$histogram = $registry->registerHistogram(
'app', 'request_duration_seconds', 'Request duration',
['route'],
[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5] // Buckets in seconds
);
$histogram->observe($elapsed, ['route' => '/api/users']);
// Query p99: histogram_quantile(0.99, rate(app_request_duration_seconds_bucket[5m]))