← Back to glossary

Three Pillars of Observability

devops Intermediate

debt(d9/e7/b7/t5)

d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9). The detection_hints state automated=no, and the absence of structured logging, metrics endpoints, or distributed tracing produces no compiler or linter warnings. The code pattern is purely operational — 'debugging requires SSH to production servers' — meaning gaps are only felt during incidents, never during development or CI.

e7 Effort Remediation debt — work required to fix once spotted

Closest to 'cross-cutting refactor across the codebase' (e7). The quick_fix outlines a staged three-step process (structured logging → metrics endpoint → OpenTelemetry traces) spanning multiple systems, configuration layers, and potentially infrastructure. Adding trace IDs to logs, exposing a /metrics endpoint, and wiring OpenTelemetry auto-instrumentation touches application code, deployment configuration, and logging pipelines across the entire codebase, not a single file or component.

b7 Burden Structural debt — long-term weight of choosing wrong

Closest to 'strong gravitational pull' (b7). The applies_to covers web, cli, and queue-worker contexts — i.e., every runtime context. Once adopted (or absent), observability shapes how every future incident is diagnosed and how every new service must be instrumented. The common_mistakes (no correlation between pillars, no structured fields, no percentiles) compound over time as the codebase grows, making every new feature carry the tax of the chosen observability posture.

t5 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'notable trap — a documented gotcha most devs eventually learn' (t5). The misconception field states the canonical wrong belief: 'Logs are sufficient for observability.' This is a well-known professional pitfall — developers who know logging well assume it covers the full observability space, not realising metrics and traces address fundamentally different questions. It is documented and commonly encountered but does not fully contradict how a similar concept works elsewhere.

About DEBT scoring → scored by claude-sonnet-4-6 · 2026-05-11 · reviewed by human

Also Known As

logs metrics traces observability three pillars

TL;DR

Logs (events), metrics (measurements), and traces (request flows) are the three pillars of observability — together they answer 'what happened, how much, and where.'

Explanation

Logs: timestamped records of discrete events — what happened. Metrics: numeric measurements aggregated over time — how much/often. Traces: the path of a single request through distributed services — where is it slow. Each pillar answers a different question. Logs explain why; metrics alert you something is wrong; traces show you where. OpenTelemetry standardises all three. Correlation IDs linking logs to traces are the glue that makes the pillars useful together. A missing pillar leaves a blind spot: metrics without traces cannot pinpoint which service is slow.

Common Misconception

✗ Logs are sufficient for observability — logs tell you what happened but cannot efficiently answer 'what is the p99 latency right now?' (needs metrics) or 'which downstream call is slow?' (needs traces).

Why It Matters

An incident with only logs takes hours to diagnose; with all three pillars it takes minutes — the pillars are complementary, not substitutes.

Common Mistakes

Logs without structured fields — free-text logs cannot be queried for specific values efficiently.
Metrics without percentiles — average response time hides the tail experience that real users suffer.
Traces without logs — a slow trace tells you where but not why; logs in context explain the reason.
No correlation between pillars — trace IDs not included in logs means the pillars cannot be joined during investigation.

Code Examples

✗ Vulnerable

// Logs only — cannot answer 'what is the p99?' or 'which service is slow?':
error_log('[2026-03-15 12:34:56] Request to /api/orders took 8423ms');
// Thousands of these lines
// To find p99: grep, awk, sort — minutes of manual work
// Which downstream call was slow? Unknown — no traces

✓ Fixed

// All three pillars with correlation:
// Metric:
$histogram->observe($duration, ['route' => '/api/orders']);

// Structured log with trace ID:
$logger->info('Request complete', [
    'duration_ms' => $duration * 1000,
    'route'       => '/api/orders',
    'trace_id'    => $span->getContext()->getTraceId(), // Links to trace
]);

// Trace automatically captures downstream calls
// In an incident: alert (metric) → dashboard (metrics) → trace → logs

References

↗ https://opentelemetry.io/docs/concepts/observability-primer/

Tags

devops observability monitoring

Added 15 Mar 2026

Edited 22 Mar 2026

Curated in Warsaw under one editorial standard. 1,448 terms, single voice. About this reference →

Rate this term

No ratings yet

🤖 AI Guestbook educational data only

| |

Last 30 days

Agents 0

No pings yet today

No pings yesterday

Amazonbot 10 Perplexity 5 Google 4 Unknown AI 3 Ahrefs 2 SEMrush 2 ChatGPT 2

Also referenced

OpenTelemetry 128 Structured Logging 23 Error Tracking 23 Prometheus & Grafana 22

How they use it

crawler 25 crawler_json 2 pre-tracking 1

Related categories

devops 1.2k observability 1k

⚡ DEV INTEL Tools & Severity

🟡 Medium ⚙ Fix effort: High

⚡ Quick Fix

Start with the easiest pillar: structured logging (add JSON formatter to Monolog) → then metrics (expose /metrics endpoint) → then traces (OpenTelemetry auto-instrumentation)

📦 Applies To

any web cli queue-worker

🔗 Prerequisites

Observability (Logs, Metrics, Traces) Structured Logging Distributed Tracing

🔍 Detection Hints

No structured logging; no metrics endpoint; no distributed tracing; debugging requires SSH to production servers

Auto-detectable: ✗ No opentelemetry prometheus grafana loki

⚠ Related Problems