When should you NOT use Cardinality in Observability?

When discussing the number of metrics collected (cardinality refers to label combinations, not metric count). When explaining trace sampling or log retention policies (cardinality is metrics-specific; traces and logs handle high-cardinality values natively). When troubleshooting latency or query performance in dashboards (cardinality causes memory exhaustion, not query slowness per se). When designing alerting thresholds or SLO targets (cardinality is an infrastructure scaling problem, not a business or service-level concern).

When is Cardinality in Observability the right choice?

Choosing what to expose as metric labels when instrumenting a service — reject user_id, request_id, IP address, and other unbounded identifiers; keep only dimensions with fixed, small counts like environment, service, method, status. Debugging why your Prometheus instance is consuming unexpectedly high memory or scraping is slow — cardinality explosion from a recently added label is the most common cause and should be your first suspect. Designing alerting rules and dashboards — if you're tempted to alert or visualize on a high-cardinality label, move that identifier to logs or traces instead and use low-cardinality aggregations in metrics. Setting up retention and storage capacity planning for a metrics backend — estimate time series count by multiplying cardinality of each label dimension; anything exceeding millions of series signals you need to remove or bucket high-cardinality labels before ingestion.

← Back to glossary

Cardinality in Observability

Observability PHP 7.0+ Intermediate

debt(d9/e7/b7/t9)

d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9). The detection_hints.tools field is not specified, and by nature cardinality explosions are invisible during development — dev environments have only a handful of users/sessions so label cardinality appears low. The explosion only manifests in production under real traffic, typically when the monitoring system starts degrading or crashing, which is exactly the d9 pattern.

e7 Effort Remediation debt — work required to fix once spotted

Closest to 'cross-cutting refactor across the codebase' (e7). The quick_fix describes auditing every metric label and moving high-cardinality values to logs or traces. This is not a single-line swap — it requires identifying all affected metrics across the codebase, removing or replacing label values, ensuring the same information is captured in structured logs or trace attributes, and potentially restructuring dashboards and alerts that relied on those labels. This is a cross-cutting change touching instrumentation code, logging setup, and monitoring configuration.

b7 Burden Structural debt — long-term weight of choosing wrong

Closest to 'strong gravitational pull' (e7, mapped to b7). The choice of metric labels, once made and deployed to production, shapes every future observability decision. Dashboards, alerts, and SLOs are built on top of those metrics. Rolling back or changing label cardinality requires coordinated changes to metrics emission, Prometheus configuration, dashboards, and alert rules. The applies_to covers both web and cli contexts, giving this wide reach across the PHP application. Every future instrumentation decision is shaped by whatever cardinality choices were made early on.

t9 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'catastrophic trap — the obvious way is always wrong' (t9). The misconception field states it explicitly: 'Adding more labels to metrics makes them more useful.' This is the instinctive, intuitive action every developer takes — they want user-level or request-level detail, so they add user_id or request_id as a label. This obvious approach is precisely what causes cardinality explosions and can bring down the entire monitoring infrastructure. The math in the misconception (10^5 = 100,000 or even 1 billion time series) shows that the natural developer instinct directly causes catastrophic failure, matching t9.

About DEBT scoring → scored by claude-sonnet-4-6 · 2026-05-06 · reviewed by human

Also Known As

high cardinality label cardinality metric cardinality cardinality explosion time series cardinality

TL;DR

The number of unique combinations of label values in a metric — high cardinality (millions of unique label combinations) causes memory exhaustion in time-series databases and is the most common observability scaling problem.

Explanation

In metrics systems like Prometheus, every unique combination of label values creates a separate time series stored in memory. A metric with labels {method, status_code} and 10 methods × 5 status codes = 50 time series — manageable. Adding a user_id label with 1 million users creates 1 million × 10 × 5 = 50 million time series — this causes Prometheus to run out of memory and crash. This is the cardinality explosion problem. High-cardinality values (user IDs, request IDs, email addresses, IP addresses, session tokens) must never be used as metric labels. They belong in traces and logs, not metrics. The correct pattern is to use low-cardinality labels (endpoint path, status class like 2xx/4xx/5xx, service name) in metrics and include high-cardinality identifiers in structured logs or trace attributes where they are stored per-event rather than as index dimensions.

Watch Out

⚠ Cardinality explosion often sneaks in through labels you think are safe: a 'tenant_id' label with 500 tenants seems fine until a multi-tenant SaaS scales to 50,000 tenants, and suddenly your previously-healthy metric is now consuming gigabytes. Always design metric labels assuming 10× growth in label value combinations, not just the values you see today.

Common Misconception

✗ Adding more labels to metrics makes them more useful. More labels exponentially increase the number of time series. A metric with 5 labels each having 10 values creates 10^5 = 100,000 time series. A metric with the same 5 labels where one has 10,000 unique values creates 10,000 × 10^4 = 1 billion time series. The rule is: if a label value identifies a specific entity (user, request, session), it belongs in a trace or log, not a metric label.

Why It Matters

Cardinality explosions are the most common way PHP teams accidentally crash their monitoring infrastructure. Adding a label like user_id or request_id to a Prometheus counter seems intuitive — you want to know which users are hitting errors. But it brings down the metrics system for everyone. Understanding cardinality prevents this and guides the correct placement of observability data: low-cardinality aggregates in metrics, high-cardinality context in traces and logs. This is the foundation of the three pillars of observability: metrics, logs, and traces each handle different cardinality requirements.

Common Mistakes

Adding user IDs, session IDs, or request IDs as Prometheus label values — each unique value creates a new time series.
Using unbounded string values from user input as metric labels — even seemingly low-cardinality values like product names can grow unbounded.
Not auditing label cardinality before adding metrics to production — test with realistic data, not dev data with five users.
Conflating high-cardinality identification (belongs in traces) with low-cardinality aggregation (belongs in metrics) — both are valuable; they just go in different places.

Avoid When

When discussing the number of metrics collected (cardinality refers to label combinations, not metric count).
When explaining trace sampling or log retention policies (cardinality is metrics-specific; traces and logs handle high-cardinality values natively).
When troubleshooting latency or query performance in dashboards (cardinality causes memory exhaustion, not query slowness per se).
When designing alerting thresholds or SLO targets (cardinality is an infrastructure scaling problem, not a business or service-level concern).

When To Use

Choosing what to expose as metric labels when instrumenting a service — reject user_id, request_id, IP address, and other unbounded identifiers; keep only dimensions with fixed, small counts like environment, service, method, status.
Debugging why your Prometheus instance is consuming unexpectedly high memory or scraping is slow — cardinality explosion from a recently added label is the most common cause and should be your first suspect.
Designing alerting rules and dashboards — if you're tempted to alert or visualize on a high-cardinality label, move that identifier to logs or traces instead and use low-cardinality aggregations in metrics.
Setting up retention and storage capacity planning for a metrics backend — estimate time series count by multiplying cardinality of each label dimension; anything exceeding millions of series signals you need to remove or bucket high-cardinality labels before ingestion.

Code Examples

✗ Vulnerable

// High cardinality — crashes Prometheus with many users
$counter->labels([
    'user_id'    => $userId,   // millions of unique values
    'request_id' => $reqId,   // billions of unique values
    'endpoint'   => $path,    // fine — limited set
])->inc();

✓ Fixed

// Low cardinality labels only — scalable metrics
$counter->labels([
    'endpoint'     => $normalizedPath,  // /users/{id} not /users/12345
    'status_class' => $statusClass,     // '2xx', '4xx', '5xx'
    'method'       => $httpMethod,      // GET, POST, etc.
])->inc();

// High-cardinality context goes in the trace span
$span->setAttribute('user.id', $userId)
     ->setAttribute('request.id', $requestId);

References

https://prometheus.io/docs/practices/naming/#labels