← Home ← Codex ← DEBT
Browse by Category
+ added · updated 7d
← Back to glossary

Metrics Types

Observability Intermediate
debt(d7/e5/b5/t7)
d7 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'only careful code review or runtime testing' (d7). The detection_hints note automated=no, and tools listed (prometheus, grafana, opentelemetry) are observability platforms that surface the symptom (misleading dashboards, missing tail latency) only after deployment and under real traffic. No linter or static analysis catches gauge-used-for-counter or average-instead-of-percentile at write time.

e5 Effort Remediation debt — work required to fix once spotted

Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix describes a conceptual swap (counter vs gauge vs histogram), but in practice correcting metric type misuse means changing instrumentation code across multiple service entry points, updating dashboard queries in Grafana, and potentially migrating stored time-series data. This goes beyond a single-line patch but stops short of a full architectural rework.

b5 Burden Structural debt — long-term weight of choosing wrong

Closest to 'persistent productivity tax' (b5). Metrics choices apply across web, cli, and queue-worker contexts per applies_to. Wrong metric types produce misleading dashboards that persist and quietly mislead every engineer doing on-call or capacity planning, creating an ongoing productivity tax across multiple work streams without necessarily reshaping the entire architecture.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap' (t7). The misconception field calls out that averages hide tail latency — a competent developer familiar with averages from statistics will naturally reach for average latency, not realizing it masks the p99 experience. Additionally, gauge vs counter behavior on process restart and summary non-aggregability across workers are non-obvious contradictions relative to how similar concepts (simple numeric accumulators) work in other contexts.

About DEBT scoring →

Also Known As

counter gauge histogram Prometheus metrics

TL;DR

Counters (ever-increasing), Gauges (current value), Histograms (distribution), and Summaries — each suited to different measurement needs in monitoring systems.

Explanation

Counter: monotonically increasing integer — request count, error count. Never decreases. Use rate() to get per-second rate. Gauge: current value that can go up or down — memory usage, active connections, queue depth. Histogram: samples observations into configurable buckets — request duration, response size. Enables percentile calculation (p95, p99) across multiple instances. Summary: calculates percentiles client-side — accurate but cannot be aggregated across instances. For distributed PHP apps, histograms are almost always preferred over summaries.

Diagram

flowchart TD
    subgraph Counter
        C[Monotonically increasing<br/>requests_total errors_total]
        C_USE["Use rate() for per-second rate<br/>never decreases"]
    end
    subgraph Gauge
        G[Current value up or down<br/>memory_usage active_connections]
        G_USE[Snapshot of state right now]
    end
    subgraph Histogram
        H[Samples in buckets<br/>request_duration_seconds]
        H_USE[Calculate p50 p95 p99<br/>aggregatable across instances]
    end
    subgraph Summary
        S[Pre-calculated percentiles<br/>accurate but not aggregatable]
    end
style C fill:#1f6feb,color:#fff
style G fill:#238636,color:#fff
style H fill:#6e40c9,color:#fff
style S fill:#d29922,color:#fff

Common Misconception

Averages are sufficient for latency metrics — average latency hides tail latency; p99 of 2000ms means 1 in 100 users waits 2 seconds even if the average is 50ms.

Why It Matters

Using the wrong metric type produces misleading dashboards — a gauge for request count goes up and down making trends invisible; a counter with rate() shows the actual request rate correctly.

Common Mistakes

  • Using a gauge for a counter (request count) — gauges reset to 0 on restart, losing the total count.
  • Average latency instead of p95/p99 histogram — hides the tail experience of slow requests.
  • High-cardinality histogram labels — a label per user ID creates millions of time series.
  • Summaries in distributed systems — summaries cannot be aggregated across PHP-FPM workers; use histograms.

Code Examples

✗ Vulnerable
// Wrong types — gauge for counter, average for latency:
$registry->registerGauge('app', 'requests_total', 'Total requests');
$gauge->set($gauge->get() + 1); // Gauge — resets to 0 on restart, not monotonic

// Average latency — hides tail:
$totalTime += $elapsed;
$avgLatency = $totalTime / $requestCount; // Misleading
✓ Fixed
// Correct types with prometheus_client_php:
$counter = $registry->registerCounter('app', 'requests_total', 'Total requests', ['route']);
$counter->inc(['route' => '/api/users']); // Monotonically increasing

$histogram = $registry->registerHistogram(
    'app', 'request_duration_seconds', 'Request duration',
    ['route'],
    [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5] // Buckets in seconds
);
$histogram->observe($elapsed, ['route' => '/api/users']);
// Query p99: histogram_quantile(0.99, rate(app_request_duration_seconds_bucket[5m]))

Added 15 Mar 2026
Edited 19 Apr 2026
Views 47
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings T 0 pings W 1 ping T 0 pings F 0 pings S 0 pings S 0 pings M 1 ping T 0 pings W 0 pings T 2 pings F 0 pings S 1 ping S 0 pings M 0 pings T 3 pings W 1 ping T 1 ping F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S 1 ping S 0 pings M 1 ping T 0 pings W
No pings yet today
PetalBot 1
Amazonbot 11 Scrapy 6 Ahrefs 4 Google 4 ChatGPT 4 Perplexity 3 Majestic 1 Claude 1 Bing 1 Meta AI 1 Sogou 1 PetalBot 1
crawler 34 crawler_json 4
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: Medium
⚡ Quick Fix
Use Counters for events that only increase (requests, errors), Gauges for current values (queue depth, connections), and Histograms for distributions (response times)
📦 Applies To
any web cli queue-worker
🔗 Prerequisites
🔍 Detection Hints
Average response time without percentiles hiding tail latency; gauge used for monotonically increasing value; histogram with too few buckets
Auto-detectable: ✗ No prometheus grafana opentelemetry
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: High ✗ Manual fix Fix: Medium Context: File


✓ schema.org compliant