← Back to glossary

Database Sharding

Performance PHP 5.0+ Advanced

debt(d7/e9/b9/t7)

d7 Detectability Operational debt — how invisible misuse is to your safety net

Closest to 'only careful code review or runtime testing' (d7). The detection_hints indicate automated detection is 'no' — tools like Vitess and PlanetScale are sharding platforms, not detection tools. Identifying that sharding was premature or that the wrong shard key was chosen requires careful architectural review and production performance analysis. There's no linter or SAST that flags 'you shouldn't have sharded yet'.

e9 Effort Remediation debt — work required to fix once spotted

Closest to 'architectural rework' (e9). The quick_fix says to exhaust alternatives before sharding, but if you've already sharded prematurely, reversing it means consolidating distributed data back to a single database or re-sharding with a different key. This requires migrating data across shards, rewriting application logic that was shard-aware, and handling the operational complexity of data rebalancing — a fundamental architectural rework.

b9 Burden Structural debt — long-term weight of choosing wrong

Closest to 'defines the system's shape' (b9). Sharding is the ultimate load-bearing architectural decision. Once sharded, every query must be shard-aware, cross-shard joins become scatter-gather operations, transactions become distributed, and the shard key choice permanently shapes how data is accessed. The common_mistakes list highlights ongoing operational burdens like rebalancing. This choice defines the entire system's data architecture.

t7 Trap Cognitive debt — how counter-intuitive correct behaviour is

Closest to 'serious trap' (t7). The misconception explicitly states developers believe 'sharding is the first step when a database becomes slow' when it should be a last resort. This contradicts how other scaling concepts work (where you typically add capacity when needed). The trap is that sharding sounds like a straightforward horizontal scaling solution but introduces enormous complexity that isn't obvious until you're committed.

About DEBT scoring → scored by claude-opus-4-5-20251101 · 2026-05-07 · reviewed by human

Also Known As

DB sharding horizontal scaling DB shard key

TL;DR

Horizontally partitioning data across multiple database instances by a shard key, scaling write throughput beyond a single server's limits.

Explanation

Sharding splits a dataset across multiple independent database nodes (shards) using a partition key (user_id % num_shards, geographic region, etc.). Each shard holds a subset of rows and handles reads/writes for its partition independently. Benefits: linear write scalability, smaller index sizes per shard, geographic distribution. Costs: cross-shard queries and joins become application-level operations, transactions span shard boundaries only with distributed protocols (2PC), rebalancing shards as data grows is operationally complex. For most PHP applications, read replicas, caching, and query optimisation should be exhausted before sharding.

Diagram

flowchart TD
    APP[Application] --> ROUTER[Shard Router<br/>which shard?]
    ROUTER -->|user_id 0-33pct| S1[(Shard 1<br/>users 0-9999)]
    ROUTER -->|user_id 34-66pct| S2[(Shard 2<br/>users 10k-19k)]
    ROUTER -->|user_id 67-100pct| S3[(Shard 3<br/>users 20k-30k)]
    subgraph Shard_Keys
        HASH[Hash sharding<br/>uniform distribution]
        RANGE[Range sharding<br/>range queries efficient]
        DIR[Directory sharding<br/>flexible lookup]
    end
    CROSS[Cross-shard queries<br/>expensive - avoid] -.->|design to prevent| ROUTER
style ROUTER fill:#6e40c9,color:#fff
style S1 fill:#1f6feb,color:#fff
style S2 fill:#238636,color:#fff
style S3 fill:#d29922,color:#fff
style CROSS fill:#f85149,color:#fff

Common Misconception

✗ Sharding is the first step when a database becomes slow. Sharding is a last resort — it introduces enormous complexity around cross-shard queries, transactions, and rebalancing. Read replicas, caching, query optimisation, and partitioning should all be exhausted first.

Why It Matters

Sharding distributes data across multiple database servers — each shard holds a subset of data, enabling horizontal scaling beyond what a single server can handle.

Common Mistakes

Sharding before exhausting vertical scaling and read replicas — premature sharding adds enormous complexity.
Choosing a shard key that creates hot shards — all traffic to one shard defeats the purpose.
Cross-shard queries and JOINs — they require scatter-gather operations that negate sharding benefits.
Not planning for shard rebalancing — adding a new shard requires migrating data, which is operationally painful.

Code Examples

✗ Vulnerable

// Shard key on auto-increment ID — hot shard problem:
// All new writes go to the last shard (highest IDs)
// Shard 1: ids 1-1M (cold), Shard 2: ids 1M-2M (cold), Shard 3: ids 2M+ (all writes)
// Better shard key: hash(user_id) — distributes writes evenly

✓ Fixed

// Horizontal sharding — route by user_id modulo shard count
class ShardRouter {
    private array $shards; // array of PDO connections

    public function getConnection(int $userId): PDO {
        $shardIndex = $userId % count($this->shards);
        return $this->shards[$shardIndex];
    }
}

// Usage
$pdo = $router->getConnection($userId);
$pdo->prepare('SELECT * FROM orders WHERE user_id = ?')->execute([$userId]);

// Key design decisions:
// - Choose a shard key with high cardinality and even distribution
// - Avoid cross-shard JOINs — denormalise or use an aggregation layer
// - Consistent hashing handles shard rebalancing with minimal data movement

References

https://www.postgresql.org/docs/current/ddl-partitioning.html