Netflix logo Netflix ยท System Design

Design Netflix Recommendation Engine

Frequency: 85/100 Scale: 250M subscribers, 80% of watched content driven by recommendations

Problem Statement

Design the recommendation system that drives Netflix content discovery. 80% of content watched on Netflix is found through recommendations. The system must personalise the homepage for 250M subscribers with updates reflecting user activity within 15 minutes.

Requirements Clarification

Functional:

  • Personalise rows and ordering on each subscriber's homepage
  • Rank titles by predicted play probability for that subscriber
  • Reflect significant user activity (completed show, thumbed down a title) within 15 minutes
  • Support A/B testing of recommendation algorithms
  • Provide row labels: "Because you watched X"

Non-Functional:

  • 250M subscribers
  • Recommendation payload available within 200ms of homepage load
  • Model retraining: daily offline; feature freshness: < 15 minutes

Feature Store

Recommendations require features for three entities: user (watch history, ratings, time-of-day patterns), item (genre, cast, popularity, freshness), and user-item pair (predicted affinity). Features are pre-computed and stored in a low-latency feature store (Netflix uses EVCache, a distributed Memcached layer). Feature lookup at serving time: ~5ms.

Two-Stage Retrieval and Ranking

With 15,000+ titles, scoring all candidates at request time is too slow. The pipeline uses two stages:

Stage 1 (candidate generation, ~50ms): Collaborative filtering via matrix factorisation produces user and item embedding vectors. ANN search retrieves the 200โ€“500 titles nearest to the user's embedding. Content-based and trending signals contribute additional candidates.

Stage 2 (ranking, ~100ms): A lightweight LightGBM model scores all candidates using pre-computed features. The top 50 go through a deep neural network that applies real-time contextual features (device type, time of day, current session length). The final ranking optimises for predicted long-term engagement: a title clicked but abandoned after 5 minutes scores lower than one watched to completion.

Near-Real-Time Feature Updates

User activity events (play, complete, thumb up/down) are published to Kafka. A Flink stream processing job updates the user's feature vector in EVCache within 5โ€“10 minutes. The next homepage load picks up the updated features.

Full model retraining runs daily offline. Real-time personalisation comes from feature freshness, not model freshness.