Database Sharding
debt(d7/e9/b9/t7)
Closest to 'only careful code review or runtime testing' (d7). The detection_hints indicate automated detection is 'no' — tools like Vitess and PlanetScale are sharding platforms, not detection tools. Identifying that sharding was premature or that the wrong shard key was chosen requires careful architectural review and production performance analysis. There's no linter or SAST that flags 'you shouldn't have sharded yet'.
Closest to 'architectural rework' (e9). The quick_fix says to exhaust alternatives before sharding, but if you've already sharded prematurely, reversing it means consolidating distributed data back to a single database or re-sharding with a different key. This requires migrating data across shards, rewriting application logic that was shard-aware, and handling the operational complexity of data rebalancing — a fundamental architectural rework.
Closest to 'defines the system's shape' (b9). Sharding is the ultimate load-bearing architectural decision. Once sharded, every query must be shard-aware, cross-shard joins become scatter-gather operations, transactions become distributed, and the shard key choice permanently shapes how data is accessed. The common_mistakes list highlights ongoing operational burdens like rebalancing. This choice defines the entire system's data architecture.
Closest to 'serious trap' (t7). The misconception explicitly states developers believe 'sharding is the first step when a database becomes slow' when it should be a last resort. This contradicts how other scaling concepts work (where you typically add capacity when needed). The trap is that sharding sounds like a straightforward horizontal scaling solution but introduces enormous complexity that isn't obvious until you're committed.
Also Known As
TL;DR
Explanation
Sharding splits a dataset across multiple independent database nodes (shards) using a partition key (user_id % num_shards, geographic region, etc.). Each shard holds a subset of rows and handles reads/writes for its partition independently. Benefits: linear write scalability, smaller index sizes per shard, geographic distribution. Costs: cross-shard queries and joins become application-level operations, transactions span shard boundaries only with distributed protocols (2PC), rebalancing shards as data grows is operationally complex. For most PHP applications, read replicas, caching, and query optimisation should be exhausted before sharding.
Diagram
flowchart TD
APP[Application] --> ROUTER[Shard Router<br/>which shard?]
ROUTER -->|user_id 0-33pct| S1[(Shard 1<br/>users 0-9999)]
ROUTER -->|user_id 34-66pct| S2[(Shard 2<br/>users 10k-19k)]
ROUTER -->|user_id 67-100pct| S3[(Shard 3<br/>users 20k-30k)]
subgraph Shard_Keys
HASH[Hash sharding<br/>uniform distribution]
RANGE[Range sharding<br/>range queries efficient]
DIR[Directory sharding<br/>flexible lookup]
end
CROSS[Cross-shard queries<br/>expensive - avoid] -.->|design to prevent| ROUTER
style ROUTER fill:#6e40c9,color:#fff
style S1 fill:#1f6feb,color:#fff
style S2 fill:#238636,color:#fff
style S3 fill:#d29922,color:#fff
style CROSS fill:#f85149,color:#fff
Common Misconception
Why It Matters
Common Mistakes
- Sharding before exhausting vertical scaling and read replicas — premature sharding adds enormous complexity.
- Choosing a shard key that creates hot shards — all traffic to one shard defeats the purpose.
- Cross-shard queries and JOINs — they require scatter-gather operations that negate sharding benefits.
- Not planning for shard rebalancing — adding a new shard requires migrating data, which is operationally painful.
Code Examples
// Shard key on auto-increment ID — hot shard problem:
// All new writes go to the last shard (highest IDs)
// Shard 1: ids 1-1M (cold), Shard 2: ids 1M-2M (cold), Shard 3: ids 2M+ (all writes)
// Better shard key: hash(user_id) — distributes writes evenly
// Horizontal sharding — route by user_id modulo shard count
class ShardRouter {
private array $shards; // array of PDO connections
public function getConnection(int $userId): PDO {
$shardIndex = $userId % count($this->shards);
return $this->shards[$shardIndex];
}
}
// Usage
$pdo = $router->getConnection($userId);
$pdo->prepare('SELECT * FROM orders WHERE user_id = ?')->execute([$userId]);
// Key design decisions:
// - Choose a shard key with high cardinality and even distribution
// - Avoid cross-shard JOINs — denormalise or use an aggregation layer
// - Consistent hashing handles shard rebalancing with minimal data movement