Design Microsoft Teams Real-Time Messaging
Problem Statement
Design the real-time messaging infrastructure for Microsoft Teams. Users send messages in channels (up to 25,000 members) and 1:1 chats. Messages must be delivered under 500ms P99 with no loss after acknowledgment. The system must support message history, reactions, threading, and presence.
Requirements Clarification
Functional:
- Send and receive messages in 1:1 chats and group channels
- Persistent message history with pagination
- Threading: replies do not clutter the main channel view
- Reactions: emoji reactions visible to all channel members
- Presence: online/away/offline status for contacts
Non-Functional:
- 300M MAU, 100M DAU
- 10B messages/day (~115k messages/second average, 5x peak)
- P99 delivery latency < 500ms
- No message loss after server acknowledgment
- Presence updates within 10 seconds of status change
Real-Time Delivery
Each client holds a persistent WebSocket to a connection server. Connection servers are stateless routing layers: they fan out messages to connected clients, not store history.
1:1 chats: Message written to storage, then pushed to the recipient's connection server via internal pub-sub. Two hops, low latency.
Large channels (up to 25,000 members): Synchronous fan-out to 25,000 connection servers is too slow. The system publishes the message to a Kafka topic per channel. Connection servers subscribe to relevant channel topics and push to their connected clients. This decouples write throughput from fan-out cost and handles slow or offline members without blocking the sender.
Storage
Messages are stored in Azure Cosmos DB partitioned by channel_id. Writes are synchronous: the sender receives an acknowledgment only after durable write to three replicas. Message IDs are monotonically increasing per channel via a distributed sequence service, preserving ordering without global coordination.
Presence at Scale
Each connected client sends a heartbeat every 5 seconds. The presence service aggregates heartbeats in Redis with a 15-second TTL. Two missed heartbeats (10 seconds) marks the client as away. At 100M DAU, 100M clients heartbeating every 5 seconds equals 20M writes/second. Redis Cluster with consistent hashing handles this; no single shard sees more than ~200k writes/second at 100 shards.
Key Concepts to Master
A distributed hashing scheme that minimizes key remapping when nodes are added or removed.
Asynchronous communication buffer between services. Decouples producers from consumers and provides durability during traffic spikes.
A shared cache layer across multiple nodes used to absorb read traffic from the primary database and reduce latency on hot data paths. The difference between a 2ms and a 200ms read at scale.
Further Reading
Resources that cover this problem in depth.