Design WhatsApp Messaging
Problem Statement
Design the core messaging infrastructure for WhatsApp. Two billion users send text, images, and voice messages. Messages must be delivered in under 500ms P99, work reliably on low-bandwidth mobile connections, and preserve end-to-end encryption across all message types.
Requirements Clarification
Functional:
- Send and receive messages in 1:1 and group chats (up to 1024 members)
- Support text, images, video, voice messages, and documents
- Delivery receipts: sent (server received), delivered (device received), read (user opened)
- Offline delivery: messages queue when recipient is offline and deliver on reconnect
- End-to-end encryption: server never holds plaintext
Non-Functional:
- 2B registered users, 500M DAU
- 100B messages/day: ~1.15M messages/second average, 5x peak
- P99 delivery latency < 500ms for online recipients
- Works on 2G networks (200kbps, 500ms RTT)
- Messages durable: no loss after server acknowledgment
Scale Estimation
100B messages/day. Average message size: 1KB text, 50KB voice, 500KB image. Text dominates by count; media dominates by bytes. Media storage: assuming 20% of messages are images at 500KB compressed, that is 10B * 500KB = 5PB per day added to storage. Media is stored separately from message metadata: the message record holds a reference, not the bytes.
Connection Model
Each device maintains a persistent TCP connection to a connection server. WhatsApp uses XMPP over a custom binary protocol (evolved from XMPP but compressed). Why persistent TCP: on mobile networks, TCP handshake costs 200-400ms. Re-establishing a connection per message on 2G is prohibitive. The persistent connection is kept alive with periodic pings (~60 second intervals).
Connection servers are stateless routing nodes. They hold the socket but not message state. A mapping service (backed by Redis) maps user ID to the current connection server handling that user's socket. When a message arrives for user B, the system looks up B's connection server and routes there.
Message Delivery Flow
- Sender writes message to the server. Server acknowledges with a server-side message ID. Sender displays single tick.
- Server routes message to recipient's connection server.
- Recipient's device receives the message and sends a delivery acknowledgment.
- Server marks the message as delivered, notifies sender. Sender displays double tick.
- Recipient opens the conversation. Device sends a read receipt.
- Server notifies sender. Sender displays blue double tick.
For offline recipients: messages persist in the server queue with a TTL of 30 days. On reconnect, the device fetches the queue and sends delivery acknowledgments as messages process.
Group Messaging
Group chats with up to 1024 members introduce a fan-out problem. For a 1024-member group, one message triggers up to 1024 individual deliveries.
WhatsApp uses a fan-out-on-write approach for groups: when a message is sent, the server writes one message record and then asynchronously delivers to each member's connection server via a message queue. Members who are offline get queued delivery. This decouples the sender's acknowledgment from the delivery fan-out: the sender gets their acknowledgment after the server persists the message, not after all 1024 deliveries complete.
End-to-End Encryption
WhatsApp uses the Signal Protocol. Each device generates a public/private key pair. The server stores public keys only. Messages are encrypted on the sender's device with the recipient's public key and decrypted only on the recipient's device.
The server cannot read message content. Delivery receipts are metadata, not content, and are not encrypted. For group messages, the sender encrypts the message separately for each recipient using their individual public key. At 1024 members, that is 1024 encryption operations client-side before the message is sent. On modern hardware this completes in under 100ms.
Media Handling
Media is not sent through the message pipeline. The sender uploads media to object storage (a distributed blob store), receives a URL and an encryption key. The message record contains the URL and key, not the bytes. The recipient downloads media directly from the blob store using the URL, decrypts locally using the key. The server never holds unencrypted media.
This design keeps message delivery latency independent of media size. A 1GB video sends in the same time as a 1KB text message from the server's perspective.
Interview Tip
Most candidates describe a standard messaging system without addressing two WhatsApp-specific constraints: the Signal Protocol encryption model (which means the server cannot read content, and group fan-out requires per-member encryption), and the 2G mobile network constraint (which explains persistent TCP connections and compressed binary protocols). Interviewers at Meta will also probe the delivery receipt state machine: exactly when each tick appears, what server state tracks it, and how the system handles the case where the sender goes offline before receiving the read receipt back.
Key Concepts to Master
A distributed hashing scheme that minimizes key remapping when nodes are added or removed.
Asynchronous communication buffer between services. Decouples producers from consumers and provides durability during traffic spikes.
A shared cache layer across multiple nodes used to absorb read traffic from the primary database and reduce latency on hot data paths. The difference between a 2ms and a 200ms read at scale.
Further Reading
Resources that cover this problem in depth.