Design Amazon Prime Video Streaming
Problem Statement
Design the video streaming infrastructure for Amazon Prime Video. The system stores and delivers licensed content to 200M Prime members globally. It must handle adaptive bitrate streaming, multi-region delivery, and smooth playback under variable network conditions.
Requirements Clarification
Functional:
- Stream video at adaptive bitrate (240p to 4K HDR)
- Support multiple audio tracks, subtitles, and accessibility features
- Resume playback position across devices
- Download for offline viewing (selected titles)
- DRM enforcement: prevent unauthorized copying
Non-Functional:
- 200M Prime members, 5M concurrent streams at peak
- First frame under 3 seconds (startup latency)
- Rebuffering rate under 0.5% of total playback time
- Global delivery: 240 countries
- DRM must not add perceptible latency to playback
Content Ingestion Pipeline
Studio masters arrive as camera RAW or ProRes files: typically 2-4TB per film at 4K. The ingestion pipeline:
- Ingest to S3 (origin storage)
- Validate integrity and metadata (title, language, rating, license terms)
- Transcode: split into 2-second chunks, transcode each chunk independently across a distributed worker fleet (similar to AWS Elemental MediaConvert). Output: 8-12 bitrate/resolution profiles (240p at 400kbps through 4K HDR at 15Mbps)
- Package into HLS and DASH manifests
- Apply DRM encryption (Widevine, PlayReady, FairPlay)
- Store chunks and manifests in S3, metadata in DynamoDB
Chunked transcoding enables parallelism: a 2-hour film with 2-second chunks is 3600 chunks. Transcoding 3600 chunks in parallel across a worker fleet reduces total transcode time from 8 hours to under 15 minutes.
Content Distribution
Prime Video uses a two-tier CDN architecture.
Tier 1: AWS CloudFront serves as the global CDN, with PoPs in 90+ countries. Popular content is cached at PoPs closest to the viewer. Cache hit rate for top 10,000 titles exceeds 90%.
Tier 2: Direct partnerships with ISPs (similar to Netflix Open Connect). Amazon deploys cache appliances in large ISPs' data centers in high-demand markets. ISPs benefit from reduced transit costs; Amazon reduces egress fees and latency. A viewer on Comcast in a large US metro often streams from a cache appliance two hops from their router.
Long-tail content (older titles with low concurrent viewers) is not cached at ISP appliances. Those streams come from CloudFront or directly from S3 via CloudFront.
Adaptive Bitrate Streaming
The player downloads a manifest listing all chunk URLs grouped by bitrate profile. Every 2-4 seconds, the player's ABR algorithm selects the quality tier for the next chunk based on buffer occupancy and network throughput estimates.
The critical metric is buffer occupancy, not instantaneous bandwidth. A player with a 30-second buffer can absorb a 5-second network dip without rebuffering. The ABR algorithm is conservative: it will drop quality before letting the buffer drain below 10 seconds. Rebuffering is the event that destroys session quality; a momentary quality reduction is invisible to most viewers.
Prime Video uses a model-predictive ABR algorithm: it estimates future bandwidth from recent measurements and selects the highest stable bitrate that the model predicts won't drain the buffer. This reduces rebuffering by ~30% compared to simple throughput-based selection.
DRM Architecture
DRM (Digital Rights Management) enforces license terms: which devices can play, whether offline download is permitted, and regional restrictions.
The flow: the player requests a license from a DRM license server (Amazon's own, or Widevine/PlayReady operated by Google/Microsoft). The license server verifies the user's entitlement (did they purchase or subscribe to this title?), returns a decryption key valid for the session. The player decrypts chunks on-device in a protected hardware enclave (TEE). The decryption key never leaves the enclave.
The license server must respond in under 500ms for the player to start playback without delay. License requests are stateless and horizontally scalable. Entitlement checks query DynamoDB (user subscription status) and ElastiCache (cached entitlement for the session duration).
Watch Position and Multi-Device Resume
Playback position writes to DynamoDB every 30 seconds and on pause/stop. The partition key is user ID; the sort key is content ID and device type. Multi-device resume reads the most recent position across all devices for the same content. Worst-case position loss (player crash with no final write): 30 seconds, which Prime Video has determined is within acceptable UX tolerance.
Interview Tip
The question that elevates answers at Amazon is the DRM architecture. Most candidates describe CDN delivery and ABR streaming (correct but standard). Interviewers want to hear how DRM interacts with the streaming pipeline: specifically, that the content is encrypted at rest in S3 and at the PoP caches, that decryption happens in a hardware TEE on the player device, and that the license server latency is in the startup path (under 500ms required). Understanding that the CDN can cache encrypted content without holding the keys (the CDN is not trusted with decryption) demonstrates a depth of understanding that separates strong candidates.
Key Concepts to Master
A shared cache layer across multiple nodes used to absorb read traffic from the primary database and reduce latency on hot data paths. The difference between a 2ms and a 200ms read at scale.
Content Delivery Network. A geographically distributed network of edge servers that caches content close to end users, reducing origin server load and cutting time-to-first-byte by 50โ200ms depending on user location.
Object storage for unstructured binary data: images, videos, documents, ML model weights. Designed for durability and throughput at scale, not low-latency random access.
Further Reading
Resources that cover this problem in depth.