Observability terms
Running software in production without observability is flying blind. Metrics, logs, traces, dashboards, alerting, and the three pillars of observability give you the insight needed to understand what your system is actually doing under real conditions. This category covers the tools, patterns, and vocabulary of production-grade visibility.
More on Observability
History
Observability emerged as a formal discipline in software engineering around the mid-2010s, building on decades of monitoring and logging practices but distinguishing itself through the ability to ask arbitrary questions about system behavior without pre-instrumentation. The concept was heavily influenced by control theory and popularized by companies like Honeycomb and practitioners writing about microservices complexity, where traditional monitoring dashboards proved insufficient for understanding distributed systems. Key milestones include the rise of structured logging, the development of distributed tracing standards (particularly OpenTelemetry), and the proliferation of metrics collection through tools like Prometheus. The maturation of cloud-native and containerized architectures accelerated adoption, as teams needed deeper insight into ephemeral, dynamic infrastructure. Today, observability has become foundational to DevOps and SRE practices, encompassing logs, metrics, and traces as the three pillars necessary for operating reliable systems at scale.
Key concepts
- APM — Application Performance Monitoring
- Four Golden Signals
- Metrics Types
- Distributed Tracing
- Structured Logging
- SLO / SLI / SLA
- OpenTelemetry
Best references
-
OpenTelemetry Documentation The canonical specification and implementation guide for the open standard in distributed tracing, metrics, and logs. Essential reference for understanding modern observability instrumentation.
-
Prometheus Documentation Authoritative guide to Prometheus metrics, alerting, and time-series concepts. Foundational for understanding metric collection, cardinality management, and the Four Golden Signals.
-
Google SRE Book: Monitoring Distributed Systems Seminal chapter on SLOs, SLIs, error budgets, and alert design from Google's Site Reliability Engineering practice. Establishes best practices for observability strategy.
-
Grafana Documentation Complete reference for visualization, dashboarding, and alerting patterns. Covers integration with Prometheus, Loki, and other data sources.
-
The Art of Monitoring by Arturo Borrero González Practical guide to designing effective monitoring and observability systems, covering alerting best practices, runbooks, and on-call culture.
-
Observability Engineering by Majors, Fong-Jones, Miranda (O'Reilly) Contemporary comprehensive treatment of observability principles, including cardinality, trace sampling, and structured logging in production systems.
Typed relationships here
Edges touching a Observability term.
- Prometheus & Grafana Often seen in PHP Observability Jun 10
- OpenTelemetry Often seen in PHP Observability Jun 10
- APM Tools Often seen in PHP Observability Jun 10
- OpenTelemetry Contains Metrics Types Jun 9
- Prometheus & Grafana Leverages Metrics Types Jun 9