Consistent Hashing

The Problem with Naive Hashing

The standard modulo hash assigns a key to node hash(key) % N, where N is the number of nodes. This works fine until N changes. Add one node to a 10-node cluster and you change the denominator from 10 to 11. Almost every key now maps to a different node: roughly 90% of keys remap. For a distributed cache, that means a near-total cache miss storm on the next request wave. For a database, it means a massive data migration. The problem is not correctness; it is the cost of topology changes.

Consistent hashing solves this by ensuring that when a node is added or removed, only the keys that were assigned to that node need to move. In a well-balanced cluster of N nodes, adding one node migrates approximately 1/N of keys. Removing one node migrates only the keys that were on the failed node.

How the Ring Works

Both nodes and keys are hashed onto a fixed circular space, typically 0 to 2³² (a 32-bit integer ring). Common hash functions used here are MD5 or MurmurHash, chosen for speed and distribution quality rather than cryptographic strength.

Each node is assigned a position by hashing its identifier (IP address, hostname, or a deliberate label). A key maps to the first node found clockwise from its own hash position. If you picture a clock face, keys "fall" to the nearest node in the clockwise direction.

When a node is added, it takes ownership only of the keys between itself and its counter-clockwise predecessor. All other key-to-node assignments stay the same. When a node is removed, its keys are absorbed by its clockwise successor. No other node is affected.

Virtual Nodes: The Load-Balancing Correction

A naive ring with one position per physical node almost always produces uneven load. The hash function distributes positions statistically, not uniformly. With 10 nodes, one node might own 25% of the ring and another only 5%, purely by chance of where they landed.

Virtual nodes fix this. Each physical node is assigned K positions on the ring (K is typically 100–200 in production systems like Cassandra and DynamoDB). Those positions are spread across the ring, so each physical node owns many small arcs instead of one large arc. As K increases, the maximum load imbalance across nodes shrinks. At K=150, the standard deviation of load across nodes drops to under 5% of mean load in most cluster sizes above 10 nodes.

Virtual nodes also make heterogeneous clusters straightforward. A node with 2x the RAM and CPU of its peers simply gets 2x as many virtual nodes, and naturally absorbs 2x the key space.

Replication Across the Ring

Consistent hashing pairs naturally with replication. For a replication factor of 3, a key is stored on the first 3 distinct physical nodes clockwise from its hash position. When a node fails, the key is still available on the other 2 replicas, and the new primary is the next live node in the clockwise direction.

This is exactly how Cassandra handles replication. Each token (Cassandra's term for a ring position) has a coordinator node, and replicas are the next N-1 nodes clockwise from that coordinator.

Where Consistent Hashing Is Used in Production

System	How It Uses Consistent Hashing
Cassandra	Token ring for data partitioning and replication
Amazon DynamoDB	Internal partitioning across storage nodes
Redis Cluster	Hash slots (16,384 slots) assigned via consistent hashing logic
Memcached clients	Client-side consistent hashing to route keys to cache nodes
CDN edge routing	Geographic load spreading across PoPs
Chord DHT	Foundational peer-to-peer lookup protocol

Redis Cluster uses a slight variant: 16,384 fixed hash slots rather than a continuous ring. Nodes own ranges of slots. The effect is identical: adding or removing a node migrates only the slots assigned to that node.

Failure Modes to Know

Hotspot keys: Consistent hashing distributes keys uniformly in expectation, but certain keys receive disproportionately high request rates regardless of which node they land on. A single viral content ID or a celebrity user ID will hammer whatever node owns it. Consistent hashing cannot fix this. The solution is application-level: cache the hot key redundantly on multiple nodes and route reads randomly among them.

Node failure without replication: If your ring has no replication and a node goes down, all keys assigned to it are unavailable. This is a design error; consistent hashing assumes you pair it with replication factor >= 2.

Hash function collisions: If two nodes hash to the same ring position, one silently shadows the other. Always include a unique component (not just IP) in the node identifier used for hashing.

Interview Tip

Always mention virtual nodes unprompted. Interviewers specifically probe whether you understand why a naive single-position ring creates load imbalance. The follow-up they look for: "how do you handle hot keys that consistent hashing cannot solve?" If you can describe application-level key replication for hot spots and connect the replication factor design to CAP theorem trade-offs, you have covered everything expected at L5+.