Log Aggregation (ELK/Loki)
debt(d7/e5/b5/t5)
Closest to 'only careful code review or runtime testing' (d7). The detection_hints indicate automated detection is 'no', and tools like Loki, Elasticsearch, and Filebeat only help once aggregation is set up — the absence of log aggregation (grep-across-servers debugging) is invisible until someone tries to debug a multi-service incident. No linter or static tool flags the missing setup.
Closest to 'touches multiple files / significant refactor in one component' (e5). The quick_fix describes choosing a stack (Loki or ELK), configuring shipping agents (Filebeat/Promtail), converting logs to structured JSON, and setting retention policies. This is not a one-line patch — it involves infrastructure setup, log format changes across services, and agent deployment, touching multiple files and configuration layers.
Closest to 'persistent productivity tax' (b5). Applies to web, cli, and queue-worker contexts. Once chosen, the aggregation stack (ELK vs Loki) shapes how all teams query logs, structure their JSON fields, and set retention. Switching stacks later is painful, but the choice doesn't fundamentally rewrite the system's shape — teams can still function with the wrong stack, just inefficiently.
Closest to 'notable trap (a documented gotcha most devs eventually learn)' (t5). The misconception is explicit: 'more log storage is always better.' Teams routinely store all debug logs at full rate, creating expensive noise. The common mistakes confirm this pattern — storing everything, not using structured logs. This is a well-documented gotcha that most teams hit once before learning to sample and structure.
TL;DR
Explanation
ELK stack: Logstash/Filebeat (collect) → Elasticsearch (store+index) → Kibana (search+dashboard). Full-text indexed — any field searchable, high storage cost. Loki: Grafana's log store — only indexes labels (not content), compressed content. Much cheaper than ELK. PromQL-like LogQL. Best with structured logs (JSON). Alternatives: Datadog Logs, Splunk (expensive but powerful), CloudWatch Logs. Key capabilities: full-text search, aggregation (error count by service), dashboards, alerts on log patterns. Ship logs: Filebeat/Fluentd agent → aggregator. In PHP: Monolog with socket/HTTP handler → Logstash/Loki.
Common Misconception
Why It Matters
Common Mistakes
- Not shipping logs to a central store — grep-across-servers debugging.
- Storing all debug logs at full rate — expensive and noisy.
- Not using structured logs — full-text search works, but JSON fields are essential for aggregation.
Code Examples
# No aggregation — SSH to each server:
ssh server1 grep 'ERROR' /var/log/app.log
ssh server2 grep 'ERROR' /var/log/app.log
# 20 servers = 20 SSH sessions
# Loki config:
- job_name: php_app
static_configs:
- targets: [localhost]
labels:
job: php-app
env: production
pipeline_stages:
- json:
expressions:
level: level
correlation_id: correlation_id
- labels:
level:
correlation_id:
# LogQL query:
{job='php-app'} |= 'ERROR' | json | correlation_id='abc-123'