Tag: reliability
🤖 AI Guestbook — #reliability educational data only
|
|
Last 30 days
Agents 10
Amazonbot 326Perplexity 241Google 89Unknown AI 72Ahrefs 71ChatGPT 50SEMrush 45Claude 42Meta AI 13Majestic 12Bing 10Qwen 1
Most referenced — #reliability
Health Check Patterns 1Message Idempotency 1API Backwards Compatibility 1LLM Hallucination 1Error Budget 1Queue-Based Load Levelling 1Dead Letter Queue 1API Idempotency Keys 1
How they use it
crawler 899
crawler_json 49
pre-tracking 24
Tag total972 pings
Terms pinged34 / 34
Distinct agents11
Memory Pressure Detection PHP 7.0+
Proactively identifying when a PHP process approaches its memory limit so corrective action can be taken before a fatal error.
1w ago
performance intermediate
Error Recovery Patterns
Design strategies for gracefully handling failures and restoring system functionality without data loss or user disruption.
2w ago
general intermediate
Alerting Best Practices
Good alerts are actionable, symptom-based, and rare — page on user impact, not causes. Alert fatigue from noisy alerts is as dangerous as no alerts.
2mo ago
observability intermediate
At-Least-Once Delivery
At-least-once delivery guarantees a message is delivered to at least one consumer, possibly multiple times — consumers must be idempotent to handle duplicates safely.
2mo ago
messaging intermediate
Backpressure
Backpressure signals upstream producers to slow down when downstream consumers can't keep up — preventing queue overflow, memory exhaustion, and system collapse under load.
2mo ago
messaging intermediate
Canary Deployment & Observability
Canary deployments route a small percentage of traffic to a new version — compare its golden signals against the stable version before full rollout to catch regressions automatically.
2mo ago
observability intermediate
Dead Letter Queue
A Dead Letter Queue (DLQ) captures messages that can't be processed — expired, malformed, or repeatedly failed — enabling later inspection and replay without losing data.
2mo ago
messaging intermediate
Error Budget
Error budget is the allowed amount of unreliability within an SLO period — 99.9% SLO = 43.8 min/month downtime allowed. When budget is exhausted, reliability takes priority over features.
2mo ago
observability intermediate
Health Check Patterns
Health checks report service status to load balancers and orchestrators — /health/live (is the process running?), /health/ready (can it serve traffic?), and deep health checks for dependencies.
2mo ago
observability beginner
LLM Hallucination
When a large language model generates confident-sounding text that is factually incorrect, fabricated, or unsupported by any source — a fundamental property of how language models work.
2mo ago
ai_ml intermediate
Message Idempotency
An idempotent message handler produces the same result whether called once or many times — essential for at-least-once delivery where duplicates are expected.
2mo ago
messaging intermediate
Message Ordering Guarantees
Message ordering is only guaranteed within a single Kafka partition or RabbitMQ queue — multiple partitions or consumers break FIFO order across the full topic.
2mo ago
messaging intermediate
On-Call & Runbooks
A runbook documents how to diagnose and resolve specific alerts — on-call engineers shouldn't have to think from scratch at 3am; the runbook provides the playbook.
2mo ago
observability beginner
SLO / SLI / SLA
SLI (what you measure), SLO (your internal target), SLA (your customer commitment) — the hierarchy that turns vague 'uptime' promises into measurable operational objectives.
2mo ago
observability intermediate
Rules for evolving an API without breaking existing clients — additive changes are safe, removals and renames require versioning, and deprecation needs a documented sunset period.
2mo ago
api_design intermediate
A client-generated unique key sent with non-idempotent requests — the server stores the response and returns it unchanged if the same key is received again, preventing duplicate operations.
2mo ago
api_design intermediate
On-Call Culture & Runbooks
Sustainable on-call practices — fair rotation, blameless postmortems, actionable alerts, and well-maintained runbooks that reduce mean time to recovery and prevent burnout.
2mo ago
devops intermediate
Queue-Based Load Levelling
Using a queue between producers and consumers to absorb traffic spikes — producers enqueue at any rate, consumers process at a sustainable rate, preventing the backend from being overwhelmed.
2mo ago
messaging intermediate
Runbooks & Playbooks
Documented step-by-step procedures for handling operational tasks and incidents — ensuring any on-call engineer can respond correctly under pressure without tribal knowledge.
2mo ago
general beginner
Secret Sharing — Shamir's Scheme
Splitting a secret into N shares where any K can reconstruct it — preventing single points of failure for root encryption keys and disaster recovery credentials.
2mo ago
cryptography advanced