Design iCloud Photo Sync
Problem Statement
Design the sync system that keeps a user's photo library consistent across all their Apple devices. A photo taken on iPhone must appear on iPad, Mac, and Apple TV within seconds, even when some devices are offline.
Requirements Clarification
Functional:
- Upload photos from any device to iCloud
- Sync new photos to all other devices owned by the same Apple ID
- Support offline devices: queued changes sync on reconnect
- Conflict resolution: same photo edited on two devices simultaneously
Non-Functional:
- 1B registered devices, 500M DAU
- Sync latency: photo visible on other online devices within 10 seconds of upload
- Durability: 11 nines
- Bandwidth-efficient: upload delta changes, not full re-uploads on every edit
Core Design Challenge: Delta Sync
Uploading the full image on every edit is prohibitive. The system must track which assets have changed and what changed.
Content-addressable storage: Each blob is stored under its SHA-256 hash. If two devices upload identical bytes, the server deduplicates automatically. This also means the upload is idempotent: re-uploading an existing object costs one hash lookup, not a full transfer.
Separate metadata from binary: Apple stores the original image blob and edit instructions (crops, filters) as separate records. Edits do not require re-uploading the original; only the edit manifest changes.
High-Level Architecture
Device uploads to a regional ingestion endpoint. The blob goes to object storage. A metadata record (asset ID, owner ID, device ID, version, timestamps) goes to a distributed database. A fanout service reads the metadata change stream and pushes notifications to all other devices via APNs. Receiving devices download metadata first, then fetch blobs lazily when the user opens a photo.
Offline Queue and Conflict Resolution
Changes made offline are journaled locally. On reconnect, the device replays the journal against server state. For concurrent edits on two devices: last-write-wins on edits (determined by server-assigned timestamp). For deletions: soft-delete with 30-day retention so other devices have time to sync before the blob is permanently removed.
Key Concepts to Master
A distributed hashing scheme that minimizes key remapping when nodes are added or removed.
Content Delivery Network. A geographically distributed network of edge servers that caches content close to end users, reducing origin server load and cutting time-to-first-byte by 50โ200ms depending on user location.
Object storage for unstructured binary data: images, videos, documents, ML model weights. Designed for durability and throughput at scale, not low-latency random access.
Further Reading
Resources that cover this problem in depth.