Design Netflix Video Streaming Pipeline
Problem Statement
Design the end-to-end video streaming pipeline for Netflix: from content upload to playback on a subscriber's device. The system must handle adaptive bitrate streaming, global distribution, and smooth playback under variable network conditions.
Requirements Clarification
Functional:
- Serve video globally with adaptive bitrate (ABR)
- Support 4K HDR, multiple audio tracks, and subtitles
- Resume playback position across devices
Non-Functional:
- 250M subscribers, 15M concurrent streams at peak
- Startup latency: first frame < 2 seconds
- Rebuffering rate: < 0.1% of playback time
- Delivery in 190 countries
Transcoding Pipeline
Raw studio files arrive at 4K+ resolution, often 100GB+ per title. The pipeline:
- Split video into 2–4 second chunks
- Transcode each chunk independently across a distributed worker fleet (Netflix's Cosmos platform)
- Produce 6–10 bitrate/resolution outputs: 235 kbps mobile, 1750 kbps 1080p, 15,600 kbps 4K HDR
- Package into HLS and DASH manifests
- Store chunks and manifests in blob storage
Chunked transcoding enables parallelism: a 2-hour film transcodes in under 30 minutes across thousands of workers.
Content Distribution via Open Connect
Netflix operates its own CDN called Open Connect. ISPs install Open Connect Appliances (OCAs) in their data centers. Netflix pre-positions popular content onto OCAs during off-peak hours. At stream time, the client is directed to the nearest OCA that has the requested content.
The top 1000 titles cover ~95% of stream requests from OCAs. Long-tail content falls back to S3 via a public CDN.
Adaptive Bitrate Streaming
The client player downloads a manifest listing all chunk URLs and bitrates. Every 2–4 seconds, the player measures buffer health and selects the quality tier for the next chunk. Buffer occupancy is the primary signal, not raw bandwidth: this approach (Netflix's BOLA algorithm) reduces rebuffering by 10–20% compared to bandwidth-only ABR.
Watch Position Durability
Playback position writes to Cassandra every 30 seconds and on pause/stop. Worst-case position loss on crash is 30 seconds, which Netflix has determined is within acceptable UX tolerance. The partition key is user ID; the clustering key is device ID, giving fast lookups and clean multi-device resume.
Key Concepts to Master
A distributed hashing scheme that minimizes key remapping when nodes are added or removed.
Content Delivery Network. A geographically distributed network of edge servers that caches content close to end users, reducing origin server load and cutting time-to-first-byte by 50–200ms depending on user location.
Object storage for unstructured binary data: images, videos, documents, ML model weights. Designed for durability and throughput at scale, not low-latency random access.
Further Reading
Resources that cover this problem in depth.