Database Fundamentals

0% completed

Previous
Next
Snapchat System Design & Database Design

Designing a robust and scalable database system for SnapChat involves understanding its core functionalities, user interactions, and the massive volume of data it handles daily. This case study explores the essential components and architectural decisions required to build an efficient SnapChat database system, focusing on fulfilling specific requirements through a microservices architecture.

What is SnapChat?

SnapChat is a multimedia messaging app that allows users to send ephemeral photos and videos, known as "Snaps," which disappear after being viewed. It offers features like Stories, Filters, Lenses, and direct messaging, enabling real-time, spontaneous, and visual communication among friends and followers. SnapChat emphasizes privacy and ephemeral content, making it a popular platform for sharing moments without long-term storage.

Requirements and Goals of the System

To design SnapChat's database system, we focus on fulfilling the following key requirements:

Functional Requirements

  1. User Management:

    • Sign Up/Login: Users should be able to create accounts, authenticate, and manage profiles.
    • Friend Relationships: Users can add, remove, and manage friends.
  2. Messaging:

    • Send/Receive Snaps: Users can send and receive photo and video messages that disappear after viewing.
    • Direct Messaging: Real-time text messaging between friends.
  3. Stories:

    • Post Stories: Users can create Stories visible to their friends for 24 hours.
    • View Stories: Friends can view and interact with posted Stories.
  4. Media Handling:

    • Upload/Store Media: Efficiently handle the uploading, storing, processing (applying filters/lenses), and retrieval of Snaps and Stories.
    • Ephemeral Storage: Automatically delete media after the retention period.

Non-functional Requirements

  1. Scalability: Handle millions of concurrent users and billions of Snaps and Stories daily.
  2. Low Latency: Ensure quick delivery and retrieval of messages and media.
  3. High Availability: Maintain uptime even during peak traffic times.
  4. Data Consistency: Ensure accurate delivery and deletion of Snaps and Stories.
  5. Security and Privacy: Protect user data and ensure content is ephemeral as intended.

Storage Capacity Estimation

Estimating SnapChat's storage needs involves calculating the volume of media content and user data generated daily.

Assumptions:

  • Daily Active Users (DAU): 500 million.
  • Snaps Sent per User per Day: 10.
  • Average Snap Size: 1 MB (photos) and 5 MB (videos).
  • Stories Posted per User per Day: 1.
  • Average Story Size: 10 MB.
  • Retention Period: Snaps disappear after viewing; Stories last for 24 hours.

Calculations:

  • Daily Snaps Storage:

    • Photos: 500M users * 10 Snaps * 1 MB = 5,000 TB
    • Videos: Assume 20% of Snaps are videos: 500M * 10 * 0.2 * 5 MB = 5,000 TB
    • Total Snaps: 10,000 TB/day
  • Daily Stories Storage:

    • 500M users * 1 Story * 10 MB = 5,000 TB/day
  • Total Daily Storage Requirement: Approximately 15,000 TB/day

Storage Fulfillment:

To manage this massive storage requirement, SnapChat employs a combination of NoSQL databases and object storage solutions that support horizontal scaling and efficient data deletion mechanisms to handle the ephemeral nature of content.

High Level System Design

To efficiently manage SnapChat's extensive requirements, we'll adopt a Microservices Architecture comprising four primary microservices:

  1. User Service
  2. Media Service
  3. Messaging Service
  4. Search Service

This modular approach ensures scalability, maintainability, and efficient handling of distinct functionalities while fulfilling all system requirements.

Image
Snapchat High-level System Design

Key Components

  1. Clients

    • Mobile Apps: Native applications for iOS and Android.
    • Web Interface: Limited web functionalities for certain features.
  2. Load Balancers

    • Purpose: Distribute incoming traffic evenly across multiple instances of each microservice to prevent bottlenecks.
    • Examples: NGINX, HAProxy, AWS Elastic Load Balancer.
  3. API Gateway

    • Purpose: Acts as a single entry point for all client requests, handling routing and authentication.
    • Examples: Kong, AWS API Gateway, Zuul.
  4. Microservices: Different microservices are used to perform different activities. Explore the next section to learn about the different microservices we have used for the Airbnb system.

  5. Database Cluster

    • Relational Database: Stores structured data like user profiles and relationships.
    • NoSQL Databases: Manage unstructured data such as Snaps, Stories, and messages.
    • Search Engine: Indexes content to facilitate efficient search queries.
  6. File Storage

    • Purpose: Store and serve media content (images, videos) associated with Snaps and Stories.
    • Examples: Amazon S3, Hadoop Distributed File System (HDFS).

Microservices Architecture

Adopting a microservices architecture allows SnapChat to scale each service independently and maintain a clear separation of concerns. Below is an overview of the four main microservices and their interactions.

1. User Service

  • Functionality: Handles user registration, login, profile management, and managing friend relationships.
  • Interactions:
    • API Gateway: Receives authentication and user management requests.
    • Relational Database: Stores user information and relationships.
    • Messaging Service: Coordinates friend requests and connections.
  • Requirement Fulfillment:
    • User Management: Efficiently handles sign-up, login, and profile updates through a relational database ensuring data consistency.
    • Friend Relationships: Manages friend connections, leveraging relational schemas for integrity.

2. Media Service

  • Functionality: Manages the lifecycle of Snaps and Stories, including uploading, processing (filters/lenses), storage, and retrieval.
  • Interactions:
    • API Gateway: Receives media upload and retrieval requests.
    • Object Storage: Stores the raw and processed media files.
    • Messaging Service: Coordinates media delivery to recipients.
  • Requirement Fulfillment:
    • Media Handling: Uses NoSQL databases and object storage to handle high-volume, unstructured media data efficiently.
    • Ephemeral Storage: Automates deletion of media after the retention period, managing storage capacity effectively.

3. Messaging Service

  • Functionality: Facilitates sending and receiving Snaps and direct messages, ensuring they are delivered and deleted appropriately.
  • Interactions:
    • API Gateway: Receives messaging requests from clients.
    • NoSQL Database: Stores message metadata and statuses.
    • User Service: Verifies user relationships before message delivery.
    • Media Service: Retrieves media associated with Snaps for delivery.
  • Requirement Fulfillment:
    • Messaging: Utilizes NoSQL databases for low-latency, high-throughput message handling, ensuring real-time communication.
    • Data Consistency: Ensures accurate delivery and deletion of messages through coordinated interactions with other services.

4. Search Service

  • Functionality: Provides full-text search capabilities across users, Snaps, and Stories.
  • Interactions:
    • API Gateway: Receives search queries from users.
    • Elasticsearch Cluster: Processes and returns search results.
  • Requirement Fulfillment:
    • Search: Implements efficient content discovery and trending features using a specialized search engine.
    • Scalability: Ensures search operations remain performant as data volume grows.

Database Types

SnapChat's diverse data and access patterns necessitate the use of multiple database types, each optimized for specific use cases.

1. Relational Databases (SQL)

Use Cases:

  • User Management: Storing user profiles, authentication details, and friend relationships.
  • Transactional Operations: Ensuring data consistency for critical operations like user registrations and profile updates.

Examples:

  • PostgreSQL: Known for its robustness and advanced features.
  • MySQL: Widely used for its reliability and performance.

2. NoSQL Databases

Use Cases:

  • Media Storage: Handling large volumes of Snaps and Stories with high write and read throughput.
  • Messaging: Managing real-time message delivery and storage with low latency.

Examples:

  • Cassandra: Suitable for handling high-velocity data with excellent write performance.
  • MongoDB: Flexible schema design, useful for storing diverse media content.

3. Search Engine Databases

Use Cases:

  • Content Discovery: Enabling efficient search and discovery of users, Snaps, and Stories.
  • Trending Content: Facilitating features like trending Stories and popular Snaps.

Examples:

  • Elasticsearch: Highly scalable and offers real-time search capabilities.
  • Solr: Provides robust search features with extensive customization options.

Database Schema

Designing an effective database schema is crucial for ensuring data integrity, efficient access, and scalability. For SnapChat, leveraging both relational and NoSQL databases allows us to optimize different aspects of the platform based on their unique requirements and access patterns. Below, we explore the schema designs tailored to each database type to fulfill SnapChat's system requirements.

1. Relational Schema

Relational databases are ideal for structured data with well-defined relationships, such as user profiles and friend connections. Using a relational database ensures data consistency and supports complex queries essential for user management and relationships.

Image
Database Schema for Snapchat

Users Table

Column NameData TypeDescription
user_id (PK)BIGINTUnique identifier for each user
usernameVARCHARUnique username
emailVARCHARUser's email address
password_hashVARCHARHashed password for security
display_nameVARCHARUser's display name
bioTEXTUser's biography
creation_timeTIMESTAMPAccount creation timestamp

Friends Table

Column NameData TypeDescription
friendship_id (PK)BIGINTUnique identifier for the friendship
user_id (FK)BIGINTID of the user
friend_user_id (FK)BIGINTID of the friend user
statusENUMStatus of the friendship (e.g., pending, accepted)
creation_timeTIMESTAMPTimestamp when the friendship was established

2. NoSQL Schema (Using Cassandra)

NoSQL databases like Cassandra are suitable for handling SnapChat's high-volume, time-series data such as Snaps, Stories, and Messages. Cassandra's distributed architecture ensures high availability and scalability, making it ideal for real-time data processing.

Snaps Table

CREATE TABLE snaps ( snap_id UUID PRIMARY KEY, sender_id BIGINT, receiver_id BIGINT, media_url TEXT, media_type ENUM, -- photo or video filter_applied TEXT, timestamp TIMESTAMP, viewed BOOLEAN );

Stories Table

CREATE TABLE stories ( story_id UUID PRIMARY KEY, user_id BIGINT, media_url TEXT, media_type ENUM, -- photo or video filter_applied TEXT, timestamp TIMESTAMP, views_count INT, expiration_time TIMESTAMP );

Messages Table

CREATE TABLE messages ( message_id UUID PRIMARY KEY, sender_id BIGINT, receiver_id BIGINT, message_text TEXT, timestamp TIMESTAMP, delivered BOOLEAN, read BOOLEAN );

3. Search Engine Schema (Using Elasticsearch)

To facilitate efficient search and discovery of users, Snaps, and Stories, integrating a search engine like Elasticsearch is essential. It allows for full-text search capabilities and quick retrieval of relevant content.

Users Index

{ "user_id": "123456", "username": "johndoe", "display_name": "John Doe", "bio": "Loves photography and traveling.", "followers_count": 1500, "following_count": 300, "creation_time": "2023-01-01T00:00:00Z" }

Snaps Index

{ "snap_id": "abcdef123456", "sender_id": "123456", "receiver_id": "654321", "media_url": "https://s3.amazonaws.com/media/snap1.jpg", "media_type": "photo", "filter_applied": "sepia", "timestamp": "2024-04-25T12:00:00Z", "viewed": false }

Stories Index

{ "story_id": "ghijkl789012", "user_id": "123456", "media_url": "https://s3.amazonaws.com/media/story1.jpg", "media_type": "video", "filter_applied": "black_white", "timestamp": "2024-04-25T10:00:00Z", "views_count": 5000, "expiration_time": "2024-04-26T10:00:00Z" }

Considerations

  • Normalization vs. Denormalization:

    • Relational Schema: Emphasizes normalization to reduce redundancy, ensuring data integrity for user-related information.
    • NoSQL Schema: Embraces denormalization to optimize read performance for high-volume data like Snaps and Stories.
  • Indexing:

    • Relational Databases: Indexes on primary and foreign keys to speed up joins and lookups.
    • NoSQL Databases: Secondary indexes on frequently queried fields (e.g., receiver_id in Snaps) to enhance retrieval efficiency.
    • Elasticsearch: Field-specific analyzers and optimized indexing strategies to improve search relevance and performance.
  • Scalability:

    • NoSQL Databases: Designed for horizontal scaling, allowing SnapChat to handle growing data volumes seamlessly.
    • Elasticsearch: Scales horizontally by adding more nodes to the cluster, ensuring efficient search operations as data grows.
  • Data Consistency:

    • Relational Databases: Ensure strong consistency for user and friendship data through ACID transactions.
    • NoSQL Databases: Utilize eventual consistency for high availability and performance in handling Snaps, Stories, and Messages.

Sharding and Partitioning

Efficient data distribution is essential to handle SnapChat's massive data volume and ensure quick access.

1. Sharding Strategies

User Service (Relational Database):

  • Sharding Key: user_id
  • Method: Range-based sharding or consistent hashing to distribute users evenly across multiple database instances.
  • Considerations: Ensure that friend relationships are manageable within shards to minimize cross-shard queries.

Media Service (NoSQL Database - Cassandra):

  • Sharding Key: snap_id and message_id as UUIDs ensure even distribution.
  • Method: Cassandra inherently handles sharding using consistent hashing, distributing data across all nodes in the cluster.
  • Considerations: Optimize for write-heavy workloads and ensure data is replicated appropriately for fault tolerance.

Search Service (Elasticsearch):

  • Sharding Key: Automatic based on index settings.
  • Method: Elasticsearch automatically shards indices; configure the number of primary shards based on expected data volume and query load.
  • Considerations: Balance shards to optimize search performance and resource utilization.

2. Partitioning Techniques

  • Horizontal Partitioning: Distribute rows of a table across multiple database instances based on shard keys (user_id, snap_id).
  • Vertical Partitioning: Separate different tables or services (e.g., User Service vs. Media Service) to isolate workloads.
  • Functional Partitioning: Assign different microservices to handle distinct functionalities, reducing interdependencies.

Conclusion

By strategically utilizing a microservices architecture with dedicated User, Media, Messaging, and Search services, SnapChat can effectively manage its diverse and high-volume data requirements. The combination of relational and NoSQL databases ensures data integrity for user information while handling the scalability demands of media content and real-time messaging. Implementing robust sharding, replication, and fault tolerance mechanisms guarantees high availability and performance, providing users with a seamless and reliable experience.

Incorporating the suggested diagrams will offer a clear visual representation of SnapChat's database architecture, aiding learners in understanding the complex interactions and design decisions involved in building a scalable and efficient system like SnapChat.

.....

.....

.....

Like the course? Get enrolled and start learning!
Previous
Next