← Home ← Codex ← DEBT
Browse by Category
+ added · updated 7d
← Back to glossary

Raft Consensus Algorithm

Architecture Advanced
debt(d7/e7/b9/t7)
d7 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'only careful code review or runtime testing' (d7). Misconfigurations like even-node clusters or co-located AZs aren't caught by linters; they surface during chaos testing, partition events, or operational review. No standard SAST detects Raft topology mistakes.

e7 Effort Remediation debt — work required to fix once spotted

Closest to 'cross-cutting refactor across the codebase' (e7). Fixing a Raft deployment mistake (e.g. relocating nodes across AZs, resizing the cluster, migrating from even to odd membership) requires coordinated cluster reconfiguration, data migration, and downtime planning — well beyond a one-line patch.

b9 Burden Structural debt — long-term weight of choosing wrong

Closest to 'defines the system's shape' (b9). Raft (via etcd) is the backbone of Kubernetes and similar control planes; the CP tradeoff, quorum sizing, and AZ topology shape every operational decision around the system. Rewrite-or-live-with-it.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap' (t7). The misconception explicitly states devs assume Raft guarantees availability, but it's CP not AP — quorum loss halts writes. This contradicts the default 'distributed = highly available' intuition, and the even-vs-odd node count is counterintuitive (4 nodes is worse than 3).

About DEBT scoring →

Also Known As

Raft leader election log replication distributed consensus

TL;DR

A consensus algorithm designed to be understandable — Raft elects a leader who coordinates all writes, replicates log entries to followers, and ensures that committed entries are never lost even when servers fail.

Explanation

Raft decomposes consensus into three sub-problems: leader election, log replication, and safety. One node is always the leader; all writes go through the leader. The leader appends entries to its log and replicates them to followers. An entry is 'committed' when a majority of nodes have acknowledged it. If the leader fails, followers detect the timeout and elect a new leader via a randomised election timeout that prevents split votes. The new leader must have the most up-to-date log to be elected. This guarantees committed entries are never lost. Raft is used in etcd (Kubernetes configuration store), CockroachDB, Consul, and many other distributed systems. Understanding Raft explains why these systems require a majority quorum (2 of 3, 3 of 5) to operate.

Common Misconception

Raft guarantees availability in all scenarios. Raft sacrifices availability for consistency — if a majority quorum cannot be reached (network partition, too many failed nodes), the cluster stops accepting writes rather than risk inconsistency. This is intentional: CP, not AP.

Why It Matters

Raft is the consensus algorithm behind etcd, which is the backbone of Kubernetes. Understanding Raft explains why Kubernetes requires an odd number of control plane nodes, why losing more than half the etcd cluster is catastrophic, and why network partitions cause availability loss in systems that prioritise consistency (CP systems).

Common Mistakes

  • Running Raft clusters with even numbers of nodes — provides no additional fault tolerance compared to one fewer odd number.
  • Not monitoring leader elections — frequent re-elections indicate network instability or overloaded nodes; alert on election metrics.
  • Placing all Raft nodes in the same availability zone — defeats the purpose; a rack failure or AZ outage takes out the entire cluster.
  • Confusing Raft with Paxos — Raft is a specific algorithm designed for understandability; Paxos is a family of protocols that are theoretically equivalent but harder to implement correctly.

Code Examples

✗ Vulnerable
// ❌ Deploying etcd with 2 nodes — no fault tolerance
// 2 nodes need both to agree (majority of 2 = 2)
// One failure = cluster down
// This is worse than a single node in some failure modes

// Also: deploying 4 nodes — same fault tolerance as 3
// 4 nodes need 3 to agree; 3 nodes also need 2 to agree
// 4th node adds cost with no additional fault tolerance
✓ Fixed
# ✅ etcd cluster sizing — always odd numbers
# 3 nodes: tolerates 1 failure (majority = 2)
# 5 nodes: tolerates 2 failures (majority = 3)
# 7 nodes: tolerates 3 failures — rarely needed, adds latency

# In Kubernetes: control plane with 3 etcd nodes
# kubeadm init --control-plane-endpoint=lb:6443
# Join 2 more control plane nodes for HA

# Check Raft health in etcd
# etcdctl endpoint health --cluster
# etcdctl endpoint status --cluster --write-out=table

Added 23 Mar 2026
Views 112
Rate this term
No ratings yet
🤖 AI Guestbook educational data only
| |
Last 30 days
0 pings T 1 ping W 1 ping T 0 pings F 0 pings S 0 pings S 0 pings M 0 pings T 0 pings W 1 ping T 1 ping F 2 pings S 3 pings S 2 pings M 1 ping T 0 pings W 1 ping T 1 ping F 0 pings S 1 ping S 1 ping M 0 pings T 0 pings W 0 pings T 0 pings F 0 pings S 1 ping S 0 pings M 1 ping T 0 pings W
No pings yet today
Bing 1
ChatGPT 120 Perplexity 18 Amazonbot 18 Scrapy 10 Google 4 Majestic 3 SEMrush 3 Ahrefs 3 Claude 2 Bing 2 Meta AI 1 Sogou 1 PetalBot 1
crawler 183 crawler_json 3
DEV INTEL Tools & Severity
⚙ Fix effort: High
⚡ Quick Fix
Deploy Raft-based systems (etcd, Consul) in odd numbers (3 or 5 nodes) — this maximises fault tolerance. 3 nodes tolerate 1 failure; 5 nodes tolerate 2 failures. Even numbers provide no benefit over the next odd number down.


✓ schema.org compliant