Design Netflix Recommendation Engine

Problem Statement

Design the recommendation system that drives Netflix content discovery. 80% of content watched on Netflix is found through recommendations. The system must personalise the homepage for 250M subscribers with updates reflecting user activity within 15 minutes.

Requirements Clarification

Functional:

Personalise rows and ordering on each subscriber's homepage
Rank titles by predicted play probability for that subscriber
Reflect significant user activity (completed show, thumbed down a title) within 15 minutes
Support A/B testing of recommendation algorithms
Provide row labels: "Because you watched X"

Non-Functional:

250M subscribers
Recommendation payload available within 200ms of homepage load
Model retraining: daily offline; feature freshness: < 15 minutes

Feature Store

Recommendations require features for three entities: user (watch history, ratings, time-of-day patterns), item (genre, cast, popularity, freshness), and user-item pair (predicted affinity). Features are pre-computed and stored in a low-latency feature store (Netflix uses EVCache, a distributed Memcached layer). Feature lookup at serving time: ~5ms.

Two-Stage Retrieval and Ranking

With 15,000+ titles, scoring all candidates at request time is too slow. The pipeline uses two stages:

Stage 1 (candidate generation, ~50ms): Collaborative filtering via matrix factorisation produces user and item embedding vectors. ANN search retrieves the 200–500 titles nearest to the user's embedding. Content-based and trending signals contribute additional candidates.

Stage 2 (ranking, ~100ms): A lightweight LightGBM model scores all candidates using pre-computed features. The top 50 go through a deep neural network that applies real-time contextual features (device type, time of day, current session length). The final ranking optimises for predicted long-term engagement: a title clicked but abandoned after 5 minutes scores lower than one watched to completion.

Near-Real-Time Feature Updates

User activity events (play, complete, thumb up/down) are published to Kafka. A Flink stream processing job updates the user's feature vector in EVCache within 5–10 minutes. The next homepage load picks up the updated features.

Full model retraining runs daily offline. Real-time personalisation comes from feature freshness, not model freshness.

Problem Statement

Requirements Clarification

Feature Store

Two-Stage Retrieval and Ranking

Near-Real-Time Feature Updates

Key Concepts to Master

Further Reading