Netflix logo Netflix · System Design

Design Netflix Video Streaming Pipeline

Frequency: 92/100 Scale: 250M subscribers, 15M concurrent streams at peak

Problem Statement

Design the end-to-end video streaming pipeline for Netflix: from content upload to playback on a subscriber's device. The system must handle adaptive bitrate streaming, global distribution, and smooth playback under variable network conditions.

Requirements Clarification

Functional:

  • Serve video globally with adaptive bitrate (ABR)
  • Support 4K HDR, multiple audio tracks, and subtitles
  • Resume playback position across devices

Non-Functional:

  • 250M subscribers, 15M concurrent streams at peak
  • Startup latency: first frame < 2 seconds
  • Rebuffering rate: < 0.1% of playback time
  • Delivery in 190 countries

Transcoding Pipeline

Raw studio files arrive at 4K+ resolution, often 100GB+ per title. The pipeline:

  1. Split video into 2–4 second chunks
  2. Transcode each chunk independently across a distributed worker fleet (Netflix's Cosmos platform)
  3. Produce 6–10 bitrate/resolution outputs: 235 kbps mobile, 1750 kbps 1080p, 15,600 kbps 4K HDR
  4. Package into HLS and DASH manifests
  5. Store chunks and manifests in blob storage

Chunked transcoding enables parallelism: a 2-hour film transcodes in under 30 minutes across thousands of workers.

Content Distribution via Open Connect

Netflix operates its own CDN called Open Connect. ISPs install Open Connect Appliances (OCAs) in their data centers. Netflix pre-positions popular content onto OCAs during off-peak hours. At stream time, the client is directed to the nearest OCA that has the requested content.

The top 1000 titles cover ~95% of stream requests from OCAs. Long-tail content falls back to S3 via a public CDN.

Adaptive Bitrate Streaming

The client player downloads a manifest listing all chunk URLs and bitrates. Every 2–4 seconds, the player measures buffer health and selects the quality tier for the next chunk. Buffer occupancy is the primary signal, not raw bandwidth: this approach (Netflix's BOLA algorithm) reduces rebuffering by 10–20% compared to bandwidth-only ABR.

Watch Position Durability

Playback position writes to Cassandra every 30 seconds and on pause/stop. Worst-case position loss on crash is 30 seconds, which Netflix has determined is within acceptable UX tolerance. The partition key is user ID; the clustering key is device ID, giving fast lookups and clean multi-device resume.