← Home ← Codex ← DEBT
Browse by Category
+ added · updated 7d
← Back to glossary

Observability (Logs, Metrics, Traces)

DevOps PHP 5.0+ Intermediate
debt(d9/e7/b7/t7)
d9 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'silent in production until users hit it' (d9). The detection_hints state automated=no and the code_pattern is absence-based (no metrics endpoint, no tracing, unstructured logs). Tools like OpenTelemetry, Prometheus, Datadog, and Grafana only help once instrumentation exists — they cannot detect the absence of observability itself. A system lacking observability fails silently; you only discover it when debugging a novel production failure and realising you have no data.

e7 Effort Remediation debt — work required to fix once spotted

Closest to 'cross-cutting refactor across the codebase' (e7). The quick_fix explicitly says 'you need all three pillars' — structured logs (Monolog JSON), metrics (Prometheus /metrics endpoint), and traces (OpenTelemetry auto-instrumentation). This is not a one-line patch; it requires adding instrumentation points across services, configuring exporters, correlating IDs across request boundaries, and updating all contexts (web, cli, queue-worker) as listed in applies_to. Common mistakes like unstructured logging and missing correlation IDs compound the remediation scope.

b7 Burden Structural debt — long-term weight of choosing wrong

Closest to 'strong gravitational pull' (b7). Observability applies across all contexts (web, cli, queue-worker) and shapes every operational and debugging workflow. Once adopted, every new service, endpoint, and background job must emit structured logs, metrics, and traces consistently. Missing instrumentation anywhere weakens the whole system. The tags (devops, monitoring, operations, reliability) confirm this is a cross-cutting operational concern that every future maintainer must respect and maintain.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap — contradicts how a similar concept works elsewhere' (t7). The misconception field states explicitly that developers conflate observability with monitoring — treating predefined dashboards and alerts as sufficient. This is a serious conceptual trap because monitoring is the dominant prior mental model and observability requires a fundamentally different stance (understanding arbitrary unknown states vs. tracking known failure modes). Common mistakes (unstructured logging, metrics without context, no correlation) all stem from this same root misconception.

About DEBT scoring →

Also Known As

system observability logs metrics traces three pillars observability

TL;DR

The ability to understand a system's internal state from its external outputs — built on three pillars: logs, metrics, and distributed traces.

Explanation

Observability (as opposed to monitoring) is the degree to which a system's internal state can be inferred from its outputs. The three pillars: Logs (timestamped event records — structured JSON logs are queryable), Metrics (numeric time-series data — request rate, error rate, latency, resource usage), and Traces (end-to-end request journeys across services, correlated by trace ID). Tools: Prometheus + Grafana (metrics), ELK / Loki (logs), Jaeger / Zipkin / OpenTelemetry (traces). PHP applications emit structured logs via Monolog, expose metrics via /metrics endpoints, and propagate trace context via OpenTelemetry SDK.

Common Misconception

Observability is just a modern word for monitoring. Monitoring tracks known failure modes with predefined dashboards and alerts. Observability is the ability to understand arbitrary system states from outputs — logs, metrics, and traces — enabling diagnosis of novel failures that were never anticipated.

Why It Matters

Observability — metrics, logs, and traces — lets you understand system behaviour from the outside. A system that cannot be observed cannot be debugged or improved reliably.

Common Mistakes

  • Logging everything at DEBUG level in production — log volume makes finding real issues impossible.
  • Metrics without context — a spike in CPU is meaningless without correlated request rate and error rate.
  • Structured logging not implemented — log parsing tools cannot extract fields from unstructured log lines.
  • No correlation between metrics, logs, and traces — cannot connect a metric spike to its cause in logs.

Code Examples

✗ Vulnerable
// Unstructured log — cannot be parsed or searched reliably:
error_log('User 42 failed to login from 192.168.1.1 at ' . date('Y-m-d H:i:s'));

// Structured JSON log — searchable and filterable:
error_log(json_encode([
    'event' => 'login_failed', 'user_id' => 42,
    'ip' => '192.168.1.1', 'timestamp' => date('c')
]));
✓ Fixed
// Structured logging with context
$this->logger->info('Order placed', [
    'order_id'    => $order->id,
    'user_id'     => $order->userId,
    'total_cents' => $order->total,
    'duration_ms' => $elapsed,
]);

// Metric increment (Prometheus via StatsD)
$this->metrics->increment('orders.placed', ['status' => 'success']);
$this->metrics->histogram('orders.checkout_duration_ms', $elapsed);

// Trace span (OpenTelemetry)
$span = $tracer->spanBuilder('checkout')->startSpan();
try { /* work */ } finally { $span->end(); }

Added 15 Mar 2026
Edited 22 Mar 2026
Views 86
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings T 0 pings W 1 ping T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 5 pings T 4 pings F 7 pings S 6 pings S 5 pings M 0 pings T 3 pings W 0 pings T 1 ping F 1 ping S 1 ping S 0 pings M 2 pings T 0 pings W 0 pings T 0 pings F 0 pings S 0 pings S 2 pings M 0 pings T 0 pings W
No pings yet today
No pings yesterday
Scrapy 26 Perplexity 11 Amazonbot 9 SEMrush 5 Google 5 Ahrefs 4 ChatGPT 3 Unknown AI 2 Bing 2 Majestic 1 Claude 1 Meta AI 1 PetalBot 1
crawler 67 crawler_json 4
DEV INTEL Tools & Severity
🟡 Medium ⚙ Fix effort: High
⚡ Quick Fix
Instrument the three pillars: structured logs (Monolog JSON), metrics (Prometheus /metrics endpoint), and traces (OpenTelemetry auto-instrumentation) — you need all three
📦 Applies To
PHP 5.0+ web cli queue-worker
🔗 Prerequisites
🔍 Detection Hints
No metrics endpoint; no distributed tracing; logs are unstructured plain text with no correlation IDs
Auto-detectable: ✗ No opentelemetry prometheus datadog grafana
⚠ Related Problems
🤖 AI Agent
Confidence: Low False Positives: High ✗ Manual fix Fix: High Context: File


✓ schema.org compliant