Database Sharding
Horizontal partitioning of a database across multiple machines to distribute load beyond a single server's capacity.
What Is Database Sharding?
Sharding solves a specific capacity ceiling: a single database node — no matter how beefy — has a finite write throughput, storage ceiling, and connection limit. When you hit those walls, you can't simply add read replicas, because replicas don't help with write volume or storage. Sharding splits the dataset horizontally across multiple independent database instances, called shards, so that each shard owns a strict subset of the rows and handles only the write and read load for that subset.
The concept is straightforward. The implementation consistently costs more than teams expect.
How It Works
The core mechanism is a shard key: a field (or composite of fields) used to route every read and write to the correct shard. The application (or a routing layer) hashes or ranges the shard key, resolves which shard owns that key, and issues the query directly to that node. No shard has visibility into another shard's data.
Three strategies dominate in practice:
Range-based sharding assigns contiguous key ranges to each shard: user IDs 1–1,000,000 on shard A, 1,000,001–2,000,000 on shard B. Range queries are efficient because they stay on a single shard. The failure mode is hotspots: if user ID growth is sequential, new writes hammer shard B while shard A sits cold.
Hash-based sharding applies a consistent hash function to the shard key and assigns shards by hash(key) mod N. Distribution is even, but range queries fan out across all shards — a WHERE created_at BETWEEN x AND y becomes N parallel queries and a merge. At 10+ shards, that merge latency compounds.
Directory-based sharding routes through a lookup service that maps keys to shard IDs. It's the most flexible: you can move individual key ranges between shards without rehashing everything. The cost is an additional network hop and a new critical path dependency on the lookup service.
The Non-Obvious Part: Rebalancing
Adding a shard to an N-shard cluster using hash-based sharding requires remapping 1/(N+1) of the keyspace. At 100M rows, that's tens of millions of row migrations that must happen with zero downtime. Teams routinely underestimate this.
Consistent hashing limits the blast radius: when a new shard is added, only the keys between the new node and its predecessor on the ring move. In a 10-node cluster with 150 virtual nodes per physical node, adding one node moves roughly 10% of keys instead of all of them. See consistent hashing for the mechanism.
Even with consistent hashing, rebalancing under live traffic requires:
- A migration job reading from the source shard and writing to the target
- Dual-write or a read-from-both phase during cutover
- Monitoring key distribution to detect hotspot re-emergence post-migration
Many teams move to a fixed large shard count (256 or 1,024 logical shards) even at launch, then map logical shards to fewer physical nodes. When capacity grows, you redistribute logical shards across more physical nodes without remapping application keys. This is how DynamoDB's partition model works under the hood.
Cross-Shard Operations
Cross-shard joins are the most common sharding trap. A users table sharded by user_id and an orders table sharded by order_id will scatter a JOIN users ON orders.user_id = users.id across every shard. In production, this is either prohibited at the query layer or handled at the application level: fetch from shard A, then look up the related data on shard B in a second round-trip.
The correct mitigation is co-location: shard both users and orders by user_id, so all of a user's data lives on the same shard. Designing for co-location requires knowing your access patterns before you choose your shard key. Changing the shard key after launch is a full data migration.
Cross-shard transactions require a distributed coordination protocol. Two-phase commit works but adds latency and a coordinator failure mode. In practice, most systems design around this with sagas or by accepting eventual consistency between shards.
When to Use It
Sharding is the right call when:
- Write throughput exceeds what a single primary node and write buffering can handle (rough threshold: sustained 50k+ writes/sec on a modern instance)
- Dataset size exceeds single-node storage or causes index memory pressure (typical signal: working set no longer fits in RAM, causing disk reads)
- Connection counts from a large application fleet are saturating the database's connection limit
Start with vertical scaling (larger instance) and read replicas. Add connection pooling (PgBouncer, ProxySQL). Partition tables within a single node. Sharding multiplies operational complexity by the shard count: backups, migrations, schema changes, and monitoring all become N-times harder. Only introduce it when you have instrumented evidence of a single-node bottleneck.
Interview Tip
Interviewers assessing sharding at L5+ are checking for two things most answers miss. First: can you choose a shard key that avoids hotspots for your specific access pattern, not just recite "hash the primary key"? Demonstrate this by walking through who writes most frequently and what queries need to be co-located. Second: do you acknowledge rebalancing cost unprompted? Saying "I'd use consistent hashing to minimise data movement when adding shards" signals you've thought past the happy path. Answers that only cover range vs. hash strategies without touching rebalancing or cross-shard join avoidance stay at the L4 bar.
Related Concepts
A distributed hashing scheme that minimizes key remapping when nodes are added or removed.
A distributed system can only guarantee two of three: Consistency, Availability, and Partition Tolerance.
A shared cache layer across multiple nodes used to absorb read traffic from the primary database and reduce latency on hot data paths. The difference between a 2ms and a 200ms read at scale.