Amazon logo Amazon ยท System Design

Design Amazon Prime Video Streaming

Frequency: 82/100 Scale: 200M Prime members, 5M concurrent streams at peak

Problem Statement

Design the video streaming infrastructure for Amazon Prime Video. The system stores and delivers licensed content to 200M Prime members globally. It must handle adaptive bitrate streaming, multi-region delivery, and smooth playback under variable network conditions.

Requirements Clarification

Functional:

  • Stream video at adaptive bitrate (240p to 4K HDR)
  • Support multiple audio tracks, subtitles, and accessibility features
  • Resume playback position across devices
  • Download for offline viewing (selected titles)
  • DRM enforcement: prevent unauthorized copying

Non-Functional:

  • 200M Prime members, 5M concurrent streams at peak
  • First frame under 3 seconds (startup latency)
  • Rebuffering rate under 0.5% of total playback time
  • Global delivery: 240 countries
  • DRM must not add perceptible latency to playback

Content Ingestion Pipeline

Studio masters arrive as camera RAW or ProRes files: typically 2-4TB per film at 4K. The ingestion pipeline:

  1. Ingest to S3 (origin storage)
  2. Validate integrity and metadata (title, language, rating, license terms)
  3. Transcode: split into 2-second chunks, transcode each chunk independently across a distributed worker fleet (similar to AWS Elemental MediaConvert). Output: 8-12 bitrate/resolution profiles (240p at 400kbps through 4K HDR at 15Mbps)
  4. Package into HLS and DASH manifests
  5. Apply DRM encryption (Widevine, PlayReady, FairPlay)
  6. Store chunks and manifests in S3, metadata in DynamoDB

Chunked transcoding enables parallelism: a 2-hour film with 2-second chunks is 3600 chunks. Transcoding 3600 chunks in parallel across a worker fleet reduces total transcode time from 8 hours to under 15 minutes.

Content Distribution

Prime Video uses a two-tier CDN architecture.

Tier 1: AWS CloudFront serves as the global CDN, with PoPs in 90+ countries. Popular content is cached at PoPs closest to the viewer. Cache hit rate for top 10,000 titles exceeds 90%.

Tier 2: Direct partnerships with ISPs (similar to Netflix Open Connect). Amazon deploys cache appliances in large ISPs' data centers in high-demand markets. ISPs benefit from reduced transit costs; Amazon reduces egress fees and latency. A viewer on Comcast in a large US metro often streams from a cache appliance two hops from their router.

Long-tail content (older titles with low concurrent viewers) is not cached at ISP appliances. Those streams come from CloudFront or directly from S3 via CloudFront.

Adaptive Bitrate Streaming

The player downloads a manifest listing all chunk URLs grouped by bitrate profile. Every 2-4 seconds, the player's ABR algorithm selects the quality tier for the next chunk based on buffer occupancy and network throughput estimates.

The critical metric is buffer occupancy, not instantaneous bandwidth. A player with a 30-second buffer can absorb a 5-second network dip without rebuffering. The ABR algorithm is conservative: it will drop quality before letting the buffer drain below 10 seconds. Rebuffering is the event that destroys session quality; a momentary quality reduction is invisible to most viewers.

Prime Video uses a model-predictive ABR algorithm: it estimates future bandwidth from recent measurements and selects the highest stable bitrate that the model predicts won't drain the buffer. This reduces rebuffering by ~30% compared to simple throughput-based selection.

DRM Architecture

DRM (Digital Rights Management) enforces license terms: which devices can play, whether offline download is permitted, and regional restrictions.

The flow: the player requests a license from a DRM license server (Amazon's own, or Widevine/PlayReady operated by Google/Microsoft). The license server verifies the user's entitlement (did they purchase or subscribe to this title?), returns a decryption key valid for the session. The player decrypts chunks on-device in a protected hardware enclave (TEE). The decryption key never leaves the enclave.

The license server must respond in under 500ms for the player to start playback without delay. License requests are stateless and horizontally scalable. Entitlement checks query DynamoDB (user subscription status) and ElastiCache (cached entitlement for the session duration).

Watch Position and Multi-Device Resume

Playback position writes to DynamoDB every 30 seconds and on pause/stop. The partition key is user ID; the sort key is content ID and device type. Multi-device resume reads the most recent position across all devices for the same content. Worst-case position loss (player crash with no final write): 30 seconds, which Prime Video has determined is within acceptable UX tolerance.

Interview Tip

The question that elevates answers at Amazon is the DRM architecture. Most candidates describe CDN delivery and ABR streaming (correct but standard). Interviewers want to hear how DRM interacts with the streaming pipeline: specifically, that the content is encrypted at rest in S3 and at the PoP caches, that decryption happens in a hardware TEE on the player device, and that the license server latency is in the startup path (under 500ms required). Understanding that the CDN can cache encrypted content without holding the keys (the CDN is not trusted with decryption) demonstrates a depth of understanding that separates strong candidates.