SLO / SLI / SLA
TL;DR
SLI (what you measure), SLO (your internal target), SLA (your customer commitment) — the hierarchy that turns vague 'uptime' promises into measurable operational objectives.
Explanation
SLI (Service Level Indicator): a measured metric — request success rate, latency p99, availability. SLO (Service Level Objective): your target for an SLI — 'p99 latency < 200ms', '99.9% requests succeed'. Internal goal — what you aim for. SLA (Service Level Agreement): a contractual commitment with consequences (refunds, penalties) — '99.9% uptime per month'. Usually less strict than SLO (buffer). Error budget: (1 - SLO) × time period. 99.9% SLO = 43.8 min/month error budget. SLOs guide engineering priorities — burn through error budget fast → freeze releases, investigate. Google SRE book introduced this framework.
Common Misconception
✗ SLO and SLA are the same — SLO is internal (aspirational target); SLA is external (contractual). SLO is stricter so you catch issues before breaching the SLA.
Why It Matters
SLOs replace vague reliability goals with measurable targets — making on-call decisions data-driven: 'should we deploy?' becomes 'how much error budget remains?'
Common Mistakes
- Setting SLOs without measuring the current baseline — targets must be achievable.
- Confusing SLO with SLA — SLO should be stricter than SLA.
- Not tracking SLO compliance continuously — only noticing at month end.
Code Examples
✗ Vulnerable
// Vague commitment:
// 'We aim for high availability'
// No measurement, no target, no accountability
✓ Fixed
// SLI: request success rate (non-5xx / total)
// SLO: 99.5% over 28-day rolling window
// SLA: 99.0% (contractual, with refund below)
// Prometheus SLO:
// sum(rate(http_requests_total{code!~'5..'}[28d])) /
// sum(rate(http_requests_total[28d])) > 0.995
// Error budget remaining:
// (1 - 0.995) * 28d = 2h error budget/month
Tags
🤝 Adopt this term
£79/year · your link shown here
Added
23 Mar 2026
Views
26
🤖 AI Guestbook educational data only
|
|
Last 30 days
Agents 0
No pings yet today
No pings yesterday
Perplexity 8
Amazonbot 7
Ahrefs 2
Google 2
ChatGPT 1
Unknown AI 1
Meta AI 1
Also referenced
How they use it
crawler 20
pre-tracking 2
Related categories
⚡
DEV INTEL
Tools & Severity
🟠 High
⚙ Fix effort: Medium
⚡ Quick Fix
Define SLIs for availability and latency. Set SLO 0.5% stricter than SLA. Track 28-day rolling window. Alert when error budget < 50% consumed.
📦 Applies To
web
cli
queue-worker
🔍 Detection Hints
Auto-detectable:
✗ No
prometheus
datadog
grafana
🤖 AI Agent
Confidence: Low
False Positives: High
✗ Manual fix
Fix: High
Context: File