Cardinality in Metrics
debt(d7/e5/b7/t7)
Closest to 'only careful code review or runtime testing' (d7). The detection_hints say automated=no and the only tool listed is Prometheus itself (via /api/v1/label/__name__/values). There is no static analysis or linter that flags high-cardinality label usage at write time; the code pattern 'labels.*user_id|labels.*session' requires manual review. The problem typically only surfaces at runtime when series counts explode or OOM occurs — not at commit time.
Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix says to replace unbounded label values with bounded sets and audit cardinality. In practice this means finding every instrumentation call that uses user_id, session_id, or raw URLs across the codebase, redesigning what labels are emitted, and potentially migrating or dropping existing time-series data. This is more than a single-line swap but typically scoped to one service's instrumentation layer, placing it at e5.
Closest to 'strong gravitational pull' (b7). The applies_to covers web, cli, and queue-worker contexts — the full application surface. Every future label addition must be evaluated against cardinality budgets. A single past mistake can hold the entire metrics infrastructure hostage (OOM crash, degraded query performance), shaping all future instrumentation decisions and requiring ongoing governance. This is a persistent, cross-cutting structural concern.
Closest to 'serious trap' (t7). The misconception field states exactly: 'More labels = more useful metrics' — the intuitive developer belief is that adding labels increases observability value linearly. In reality each label multiplies series count multiplicatively, and using user_id or session_id (which feel like obvious debugging labels) can OOM Prometheus. This contradicts how labels appear to work (additive descriptors) and is a well-documented gotcha that surprises even experienced developers new to Prometheus.
TL;DR
Explanation
Each unique label combination = one time series. 100 routes × 5 methods × 10 status codes = 5000 series (fine). 100 routes × 1M user_ids = 100M series (crash). High-cardinality labels: user_id, request_id, URL with query params, IP address, session_id. Solutions: don't use high-cardinality fields as labels (they belong in logs/traces). Aggregate high-cardinality dimensions in application code before recording. Prometheus native histograms (v0.16+): improve cardinality for latency. Grafana Tempo/Honeycomb: handle high cardinality natively. Use Prometheus for low-cardinality aggregate metrics; logs/traces for high-cardinality detail.
Common Misconception
Why It Matters
Common Mistakes
- User ID or session ID as a Prometheus label.
- Full URL (including query params) as label — unbounded.
- Not auditing cardinality before adding new labels.
Code Examples
// HIGH cardinality — one series per user:
$histogram->labels([
'user_id' => $userId, // Unbounded
'url' => $request->fullUrl(), // Includes query params
])->observe($duration);
// LOW cardinality — bounded dimensions only:
$histogram->labels([
'route' => $request->route()->getName(), // e.g. 'api.orders.show'
'method' => $request->method(),
])->observe($duration);
// user_id → goes in logs and traces, not metrics