← Back to glossary

Metrics Types

Observability Intermediate

debt(d7/e5/b5/t7)

d7 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'only careful code review or runtime testing' (d7). The detection_hints note automated=no, and tools listed (prometheus, grafana, opentelemetry) are observability platforms that surface the symptom (misleading dashboards, missing tail latency) only after deployment and under real traffic. No linter or static analysis catches gauge-used-for-counter or average-instead-of-percentile at write time.

e5 Effort Remediation debt — work required to fix once spotted

Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix describes a conceptual swap (counter vs gauge vs histogram), but in practice correcting metric type misuse means changing instrumentation code across multiple service entry points, updating dashboard queries in Grafana, and potentially migrating stored time-series data. This goes beyond a single-line patch but stops short of a full architectural rework.

b5 Burden Structural debt — long-term weight of choosing wrong

Closest to 'persistent productivity tax' (b5). Metrics choices apply across web, cli, and queue-worker contexts per applies_to. Wrong metric types produce misleading dashboards that persist and quietly mislead every engineer doing on-call or capacity planning, creating an ongoing productivity tax across multiple work streams without necessarily reshaping the entire architecture.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap' (t7). The misconception field calls out that averages hide tail latency — a competent developer familiar with averages from statistics will naturally reach for average latency, not realizing it masks the p99 experience. Additionally, gauge vs counter behavior on process restart and summary non-aggregability across workers are non-obvious contradictions relative to how similar concepts (simple numeric accumulators) work in other contexts.

About DEBT scoring → scored by claude-sonnet-4-6 · 2026-05-10 · reviewed by human

Also Known As

counter gauge histogram Prometheus metrics

TL;DR

Counters (ever-increasing), Gauges (current value), Histograms (distribution), and Summaries — each suited to different measurement needs in monitoring systems.

Explanation

Counter: monotonically increasing integer — request count, error count. Never decreases. Use rate() to get per-second rate. Gauge: current value that can go up or down — memory usage, active connections, queue depth. Histogram: samples observations into configurable buckets — request duration, response size. Enables percentile calculation (p95, p99) across multiple instances. Summary: calculates percentiles client-side — accurate but cannot be aggregated across instances. For distributed PHP apps, histograms are almost always preferred over summaries.

Diagram

flowchart TD
    subgraph Counter
        C[Monotonically increasing<br/>requests_total errors_total]
        C_USE["Use rate() for per-second rate<br/>never decreases"]
    end
    subgraph Gauge
        G[Current value up or down<br/>memory_usage active_connections]
        G_USE[Snapshot of state right now]
    end
    subgraph Histogram
        H[Samples in buckets<br/>request_duration_seconds]
        H_USE[Calculate p50 p95 p99<br/>aggregatable across instances]
    end
    subgraph Summary
        S[Pre-calculated percentiles<br/>accurate but not aggregatable]
    end
style C fill:#1f6feb,color:#fff
style G fill:#238636,color:#fff
style H fill:#6e40c9,color:#fff
style S fill:#d29922,color:#fff

Common Misconception

✗ Averages are sufficient for latency metrics — average latency hides tail latency; p99 of 2000ms means 1 in 100 users waits 2 seconds even if the average is 50ms.

Why It Matters

Using the wrong metric type produces misleading dashboards — a gauge for request count goes up and down making trends invisible; a counter with rate() shows the actual request rate correctly.

Common Mistakes

Using a gauge for a counter (request count) — gauges reset to 0 on restart, losing the total count.
Average latency instead of p95/p99 histogram — hides the tail experience of slow requests.
High-cardinality histogram labels — a label per user ID creates millions of time series.
Summaries in distributed systems — summaries cannot be aggregated across PHP-FPM workers; use histograms.

Code Examples

✗ Vulnerable

// Wrong types — gauge for counter, average for latency:
$registry->registerGauge('app', 'requests_total', 'Total requests');
$gauge->set($gauge->get() + 1); // Gauge — resets to 0 on restart, not monotonic

// Average latency — hides tail:
$totalTime += $elapsed;
$avgLatency = $totalTime / $requestCount; // Misleading

✓ Fixed

// Correct types with prometheus_client_php:
$counter = $registry->registerCounter('app', 'requests_total', 'Total requests', ['route']);
$counter->inc(['route' => '/api/users']); // Monotonically increasing

$histogram = $registry->registerHistogram(
    'app', 'request_duration_seconds', 'Request duration',
    ['route'],
    [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5] // Buckets in seconds
);
$histogram->observe($elapsed, ['route' => '/api/users']);
// Query p99: histogram_quantile(0.99, rate(app_request_duration_seconds_bucket[5m]))

Tags

Added 15 Mar 2026

Edited 19 Apr 2026

Curated in Warsaw under one editorial standard. 1,506 terms, single voice. About this reference →

Rate this term

No ratings yet

🤖 AI Guestbook educational data only

| |

Last 30 days

Agents 1

Claude 1

PetalBot 1

Amazonbot 11 Scrapy 6 Ahrefs 4 Google 4 ChatGPT 4 Perplexity 3 Claude 2 Majestic 1 Bing 1 Meta AI 1 Sogou 1 PetalBot 1

Also referenced

OpenTelemetry 158 Structured Logging 58 Prometheus & Grafana 48 Real User Monitoring (RUM) 36

How they use it

crawler 35 crawler_json 4

Related categories

observability 1.6k

⚡ DEV INTEL Tools & Severity

🟡 Medium ⚙ Fix effort: Medium

⚡ Quick Fix

Use Counters for events that only increase (requests, errors), Gauges for current values (queue depth, connections), and Histograms for distributions (response times)

📦 Applies To

any web cli queue-worker

🔗 Prerequisites

Prometheus & Grafana Observability (Logs, Metrics, Traces) Alerting & On-Call

🔍 Detection Hints

Average response time without percentiles hiding tail latency; gauge used for monotonically increasing value; histogram with too few buckets

Auto-detectable: ✗ No prometheus grafana opentelemetry

⚠ Related Problems