Four Golden Signals
TL;DR
Google SRE's Four Golden Signals — Latency, Traffic, Errors, Saturation — are the four metrics that, if monitored and alerted on, cover most production reliability concerns.
Explanation
(1) Latency: time to serve a request — distinguish successful vs error latency (errors should be fast, not slow). (2) Traffic: demand on the system — requests/sec, concurrent users, messages/sec. (3) Errors: rate of failed requests — 5xx responses, uncaught exceptions, failed jobs. (4) Saturation: how full the system is — CPU%, memory%, queue depth, disk I/O. Also: USE (Utilisation, Saturation, Errors) for resources; RED (Rate, Errors, Duration) for services. Start with these four before adding more metrics. Any one of these trending badly = something is wrong.
Common Misconception
✗ More metrics are always better — start with the four golden signals. Adding metrics without alerts just creates dashboard noise.
Why It Matters
The four golden signals provide a complete picture of system health from the user's perspective — if these four are green, the service is likely working correctly.
Common Mistakes
- Only monitoring uptime — not latency, errors, or saturation.
- Monitoring p50 latency but not p99 — p99 reveals the slow tail that users experience.
- No saturation monitoring — running out of CPU/memory/connections causes gradual degradation.
Code Examples
✗ Vulnerable
// Only uptime monitoring:
alert: ServiceDown
expr: up == 0
// Misses: slow responses, error rates, resource exhaustion
✓ Fixed
// Latency:
- alert: HighLatency
expr: histogram_quantile(0.99, rate(http_duration_seconds_bucket[5m])) > 0.5
// Errors:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~'5..'}[5m]) / rate(http_requests_total[5m]) > 0.01
// Saturation:
- alert: HighMemory
expr: process_resident_memory_bytes / node_memory_total_bytes > 0.9
Tags
🤝 Adopt this term
£79/year · your link shown here
Added
23 Mar 2026
Views
30
🤖 AI Guestbook educational data only
|
|
Last 30 days
Agents 1
No pings yesterday
Amazonbot 7
Perplexity 7
Unknown AI 3
Ahrefs 2
Google 2
SEMrush 2
ChatGPT 1
Majestic 1
Also referenced
How they use it
crawler 22
crawler_json 1
pre-tracking 2
Related categories
⚡
DEV INTEL
Tools & Severity
🟠 High
⚙ Fix effort: Medium
⚡ Quick Fix
Add alerts for all four signals: latency p99 > threshold, error rate > 1%, traffic anomaly, CPU/memory/queue saturation. Use p99, not average.
📦 Applies To
web
cli
queue-worker
🔗 Prerequisites
🔍 Detection Hints
Auto-detectable:
✗ No
prometheus
grafana
datadog
⚠ Related Problems
🤖 AI Agent
Confidence: Low
False Positives: High
✗ Manual fix
Fix: Medium
Context: File