Observability (Logs, Metrics, Traces)
debt(d9/e7/b7/t7)
Closest to 'silent in production until users hit it' (d9). The detection_hints state automated=no and the code_pattern is absence-based (no metrics endpoint, no tracing, unstructured logs). Tools like OpenTelemetry, Prometheus, Datadog, and Grafana only help once instrumentation exists — they cannot detect the absence of observability itself. A system lacking observability fails silently; you only discover it when debugging a novel production failure and realising you have no data.
Closest to 'cross-cutting refactor across the codebase' (e7). The quick_fix explicitly says 'you need all three pillars' — structured logs (Monolog JSON), metrics (Prometheus /metrics endpoint), and traces (OpenTelemetry auto-instrumentation). This is not a one-line patch; it requires adding instrumentation points across services, configuring exporters, correlating IDs across request boundaries, and updating all contexts (web, cli, queue-worker) as listed in applies_to. Common mistakes like unstructured logging and missing correlation IDs compound the remediation scope.
Closest to 'strong gravitational pull' (b7). Observability applies across all contexts (web, cli, queue-worker) and shapes every operational and debugging workflow. Once adopted, every new service, endpoint, and background job must emit structured logs, metrics, and traces consistently. Missing instrumentation anywhere weakens the whole system. The tags (devops, monitoring, operations, reliability) confirm this is a cross-cutting operational concern that every future maintainer must respect and maintain.
Closest to 'serious trap — contradicts how a similar concept works elsewhere' (t7). The misconception field states explicitly that developers conflate observability with monitoring — treating predefined dashboards and alerts as sufficient. This is a serious conceptual trap because monitoring is the dominant prior mental model and observability requires a fundamentally different stance (understanding arbitrary unknown states vs. tracking known failure modes). Common mistakes (unstructured logging, metrics without context, no correlation) all stem from this same root misconception.
Also Known As
TL;DR
Explanation
Observability (as opposed to monitoring) is the degree to which a system's internal state can be inferred from its outputs. The three pillars: Logs (timestamped event records — structured JSON logs are queryable), Metrics (numeric time-series data — request rate, error rate, latency, resource usage), and Traces (end-to-end request journeys across services, correlated by trace ID). Tools: Prometheus + Grafana (metrics), ELK / Loki (logs), Jaeger / Zipkin / OpenTelemetry (traces). PHP applications emit structured logs via Monolog, expose metrics via /metrics endpoints, and propagate trace context via OpenTelemetry SDK.
Common Misconception
Why It Matters
Common Mistakes
- Logging everything at DEBUG level in production — log volume makes finding real issues impossible.
- Metrics without context — a spike in CPU is meaningless without correlated request rate and error rate.
- Structured logging not implemented — log parsing tools cannot extract fields from unstructured log lines.
- No correlation between metrics, logs, and traces — cannot connect a metric spike to its cause in logs.
Code Examples
// Unstructured log — cannot be parsed or searched reliably:
error_log('User 42 failed to login from 192.168.1.1 at ' . date('Y-m-d H:i:s'));
// Structured JSON log — searchable and filterable:
error_log(json_encode([
'event' => 'login_failed', 'user_id' => 42,
'ip' => '192.168.1.1', 'timestamp' => date('c')
]));
// Structured logging with context
$this->logger->info('Order placed', [
'order_id' => $order->id,
'user_id' => $order->userId,
'total_cents' => $order->total,
'duration_ms' => $elapsed,
]);
// Metric increment (Prometheus via StatsD)
$this->metrics->increment('orders.placed', ['status' => 'success']);
$this->metrics->histogram('orders.checkout_duration_ms', $elapsed);
// Trace span (OpenTelemetry)
$span = $tracer->spanBuilder('checkout')->startSpan();
try { /* work */ } finally { $span->end(); }