← Back to glossary

Log Aggregation (ELK/Loki)

Observability Intermediate

debt(d7/e5/b5/t5)

d7 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'only careful code review or runtime testing' (d7). The detection_hints indicate automated detection is 'no', and tools like Loki, Elasticsearch, and Filebeat only help once aggregation is set up — the absence of log aggregation (grep-across-servers debugging) is invisible until someone tries to debug a multi-service incident. No linter or static tool flags the missing setup.

e5 Effort Remediation debt — work required to fix once spotted

Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix describes choosing a stack (Loki or ELK), configuring shipping agents (Filebeat/Promtail), converting logs to structured JSON, and setting retention policies. This is not a one-line patch — it involves infrastructure setup, log format changes across services, and agent deployment, touching multiple files and configuration layers.

b5 Burden Structural debt — long-term weight of choosing wrong

Closest to 'persistent productivity tax' (b5). Applies to web, cli, and queue-worker contexts. Once chosen, the aggregation stack (ELK vs Loki) shapes how all teams query logs, structure their JSON fields, and set retention. Switching stacks later is painful, but the choice doesn't fundamentally rewrite the system's shape — teams can still function with the wrong stack, just inefficiently.

t5 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'notable trap (a documented gotcha most devs eventually learn)' (t5). The misconception is explicit: 'more log storage is always better.' Teams routinely store all debug logs at full rate, creating expensive noise. The common mistakes confirm this pattern — storing everything, not using structured logs. This is a well-documented gotcha that most teams hit once before learning to sample and structure.

About DEBT scoring → scored by claude-sonnet-4-6 · 2026-05-10 · reviewed by human

TL;DR

Log aggregation collects logs from all services into a central searchable store — ELK (Elasticsearch+Logstash+Kibana) for full-text search, Loki (Prometheus-style) for cost-efficient label-based search.

Explanation

ELK stack: Logstash/Filebeat (collect) → Elasticsearch (store+index) → Kibana (search+dashboard). Full-text indexed — any field searchable, high storage cost. Loki: Grafana's log store — only indexes labels (not content), compressed content. Much cheaper than ELK. PromQL-like LogQL. Best with structured logs (JSON). Alternatives: Datadog Logs, Splunk (expensive but powerful), CloudWatch Logs. Key capabilities: full-text search, aggregation (error count by service), dashboards, alerts on log patterns. Ship logs: Filebeat/Fluentd agent → aggregator. In PHP: Monolog with socket/HTTP handler → Logstash/Loki.

Common Misconception

✗ More log storage is always better — logs stored but never searched are expensive waste. Store what you query; sample debug logs heavily.

Why It Matters

Centralised log aggregation transforms debugging from SSH-to-server-and-grep to sub-second search across all services — essential for microservices and autoscaling environments.

Common Mistakes

Not shipping logs to a central store — grep-across-servers debugging.
Storing all debug logs at full rate — expensive and noisy.
Not using structured logs — full-text search works, but JSON fields are essential for aggregation.

Code Examples

✗ Vulnerable

# No aggregation — SSH to each server:
ssh server1 grep 'ERROR' /var/log/app.log
ssh server2 grep 'ERROR' /var/log/app.log
# 20 servers = 20 SSH sessions

✓ Fixed

# Loki config:
- job_name: php_app
  static_configs:
    - targets: [localhost]
      labels:
        job: php-app
        env: production
  pipeline_stages:
    - json:
        expressions:
          level: level
          correlation_id: correlation_id
    - labels:
        level:
        correlation_id:

# LogQL query:
{job='php-app'} |= 'ERROR' | json | correlation_id='abc-123'

References

https://grafana.com/docs/loki/latest/