Database Fundamentals

0% completed

Previous
Next
Airbnb System Design & Database Design

Designing a robust and scalable database system for Airbnb involves understanding its core functionalities, user interactions, and the immense volume of data it handles daily. This case study explores the essential components and architectural decisions required to build an efficient Airbnb database system, focusing on unique database engineering concepts to enhance student learning.

What is Airbnb?

Airbnb is a global online marketplace that connects people looking to rent out their homes with those seeking accommodations. It allows hosts to list their properties—ranging from single rooms to entire homes—and guests to book these spaces for short-term stays. Airbnb emphasizes trust and safety through user reviews, secure payments, and verification processes, catering to millions of users worldwide.

Requirements and Goals of the System

To design Airbnb's database system, we focus on fulfilling the following key requirements:

Functional Requirements

  1. User Management:

    • Sign Up/Login: Users can create accounts, authenticate, and manage profiles.
    • Host and Guest Roles: Differentiation between hosts (property owners) and guests (renters).
  2. Property Management:

    • Listings: Hosts can create, update, and delete property listings.
    • Availability Calendars: Manage property availability to prevent double bookings.
  3. Booking Management:

    • Reservations: Guests can book available listings, and hosts can accept or decline bookings.
    • Transactional Operations: Secure and atomic booking transactions to ensure consistency.
  4. Reviews and Ratings:

    • Feedback System: Guests can leave reviews and ratings for hosts and listings, and vice versa.
  5. Payments:

    • Secure Transactions: Handle payment processing securely, managing deposits, payouts, and refunds.

Non-functional Requirements

  1. Scalability: Support millions of users and handle high volumes of listings and bookings daily.
  2. Low Latency: Ensure quick response times for searches, bookings, and updates.
  3. High Availability: Maintain uptime and reliability, especially during peak travel seasons.
  4. Data Consistency: Ensure accurate and consistent data across bookings, listings, and user profiles.
  5. Security and Privacy: Protect sensitive user data and secure payment information.

Storage Capacity Estimation

Estimating Airbnb's storage needs involves calculating the volume of user data, property listings, reservations, and interactions generated daily.

Assumptions:

  • Daily Active Users (DAU): 100 million.
  • Listings Created/Updated per Day: 1 million.
  • Average Listing Size: 5 KB (metadata) + 10 images (average 1 MB each) = ~15 MB.
  • Daily Reservations: 10 million.
  • Reservation Metadata Size: 1 KB per reservation.
  • Daily Reviews: 5 million.
  • Average Review Size: 0.5 KB.
  • Payment Transactions: 10 million transactions/day, each ~1 KB.

Calculations:

  • Daily Listings Storage:

    • 1M listings * 15 MB = 15,000 TB/day
  • Daily Reservation Storage:

    • 10M reservations * 1 KB = 10 GB/day
  • Daily Reviews Storage:

    • 5M reviews * 0.5 KB = 2.5 GB/day
  • Daily Payment Transactions Storage:

    • 10M transactions * 1 KB = 10 GB/day
  • Total Daily Storage Requirement: Approximately 15,020.5 TB/day

Storage Fulfillment:

To manage this storage requirement, Airbnb employs a combination of Relational Databases for structured data, NoSQL Databases for high-volume, unstructured data, and Object Storage Solutions for efficient media storage. Data lifecycle management ensures timely deletion or archiving of outdated content to optimize storage usage.

High-Level System Design

To efficiently manage Airbnb's extensive requirements, we'll adopt a Microservices Architecture comprising four primary microservices:

  1. User Service
  2. Listing Service
  3. Booking Service
  4. Payment Service

This modular approach ensures scalability, maintainability, and efficient handling of distinct functionalities while fulfilling all system requirements.

Image
Airbnb High-level System Design

Key Components

  1. Clients

    • Mobile Apps: Native applications for iOS and Android.
    • Web Interface: Full-featured web application for property browsing and management.
  2. Load Balancers

    • Purpose: Distribute incoming traffic evenly across multiple instances of each microservice to prevent bottlenecks.
    • Examples: NGINX, HAProxy, AWS Elastic Load Balancer.
  3. API Gateway

    • Purpose: Acts as a single entry point for all client requests, handling routing and authentication.
    • Examples: Kong, AWS API Gateway, Zuul.
  4. Microservices: Different microservices are used to perform different activities. Explore the next section to learn about the different microservices we have used for the Airbnb system.

  5. Database Cluster

    • Relational Databases: Store structured data like user profiles, reservations, and payment transactions.
    • NoSQL Databases: Manage unstructured data such as property listings and interactions.
    • Object Storage: Store media files (images, videos) efficiently with scalability.
  6. File Storage

    • Purpose: Store and serve media content (images, videos) associated with property listings.
    • Examples: Amazon S3, Google Cloud Storage.

Microservices Architecture

Adopting a microservices architecture allows Airbnb to scale each service independently and maintain a clear separation of concerns. Below is an overview of the four main microservices and how they fulfill system requirements.

1. User Service

  • Functionality: Handles user registration, login, profile management, and role assignments (host or guest).
  • Interactions:
    • API Gateway: Receives authentication and user management requests.
    • Relational Database: Stores user information and roles.
  • Requirement Fulfillment:
    • User Management: Efficiently handles sign-up, login, and profile updates through a relational database ensuring data consistency.
    • Host and Guest Roles: Manages differentiation between hosts and guests, enabling role-specific functionalities.

2. Listing Service

  • Functionality: Manages property listings, including creation, updates, deletions, availability calendars, and media uploads.
  • Interactions:
    • API Gateway: Receives listing management requests.
    • Object Storage: Stores images and videos associated with listings.
    • Booking Service: Coordinates availability to prevent double bookings.
  • Requirement Fulfillment:
    • Property Management: Utilizes NoSQL databases and object storage to handle high-volume, unstructured data efficiently.
    • Availability Calendars: Maintains real-time availability to ensure accurate booking capabilities.

3. Booking Service

  • Functionality: Facilitates reservations, ensuring transactional integrity to prevent double bookings and maintain data consistency.
  • Interactions:
    • API Gateway: Receives booking requests from guests.
    • NoSQL Database: Tracks booking statuses and availability.
    • Payment Service: Coordinates payment processing for reservations.
  • Requirement Fulfillment:
    • Transactional Operations: Ensures atomicity and consistency in booking transactions to prevent data anomalies.
    • Data Consistency: Maintains accurate reservation records through coordinated updates across databases.

4. Payment Service

  • Functionality: Manages secure payment processing, including deposits, payouts, and refunds.
  • Interactions:
    • API Gateway: Receives payment-related requests.
    • Relational Database: Stores transaction records securely.
    • Payment Gateways: Integrates with external payment processors for handling transactions.
  • Requirement Fulfillment:
    • Secure Transactions: Ensures data security and compliance through encryption and secure payment processing.
    • Transactional Integrity: Maintains accurate financial records, supporting refunds and payouts reliably.

Database Types

Airbnb's diverse data and access patterns necessitate the use of multiple database types, each optimized for specific use cases.

1. Relational Databases (SQL)

Use Cases:

  • User Management: Storing user profiles, authentication details, and roles.
  • Booking Management: Ensuring data consistency for reservations and payments.
  • Transactional Operations: Managing atomic transactions for bookings and payments.

Examples:

  • PostgreSQL: Known for its robustness and advanced features.
  • MySQL: Widely used for its reliability and performance.

2. NoSQL Databases

Use Cases:

  • Property Listings: Handling high volumes of unstructured data such as property details and media.
  • Availability Calendars: Managing real-time availability data to prevent double bookings.
  • Interactions: Tracking user interactions like reviews and ratings.

Examples:

  • MongoDB: Flexible schema design, useful for storing diverse property listings.
  • Cassandra: Suitable for handling high-velocity data with excellent write performance.

3. Object Storage

Use Cases:

  • Media Storage: Efficiently storing and retrieving large media files like images and videos.
  • Scalability: Managing massive amounts of unstructured data with ease.

Examples:

  • Amazon S3: Highly durable and scalable object storage service.
  • Google Cloud Storage: Offers similar features with integration into Google's ecosystem.

Database Schema

Designing an effective database schema is crucial for ensuring data integrity, efficient access, and scalability. For Airbnb, leveraging both relational and NoSQL databases allows optimization of different aspects of the platform based on their unique requirements and access patterns. Below, we explore the schema designs tailored to Airbnb to fulfill the system requirements.

1. Relational Schema

Image
Airbnb Database Schema

Relational databases are ideal for structured data with well-defined relationships, such as user profiles, listings, and reservations. They ensure data consistency and support complex queries essential for managing users and bookings.

Users Table

Column NameData TypeDescription
user_id (PK)BIGINTUnique identifier for each user
usernameVARCHARUnique username
emailVARCHARUser's email address
password_hashVARCHARHashed password for security
display_nameVARCHARUser's display name
bioTEXTUser's biography
roleENUMUser role (host, guest)
creation_timeTIMESTAMPAccount creation timestamp

Listings Table

Column NameData TypeDescription
listing_id (PK)BIGINTUnique identifier for each listing
host_id (FK)BIGINTID of the host user
titleVARCHARTitle of the listing
descriptionTEXTDetailed description of the listing
addressVARCHARPhysical address of the property
cityVARCHARCity where the property is located
countryVARCHARCountry where the property is located
price_per_nightDECIMALCost per night for the listing
creation_timeTIMESTAMPListing creation timestamp

Reservations Table

Column NameData TypeDescription
reservation_id (PK)BIGINTUnique identifier for each reservation
listing_id (FK)BIGINTID of the reserved listing
guest_id (FK)BIGINTID of the guest user
start_dateDATEReservation start date
end_dateDATEReservation end date
total_priceDECIMALTotal cost of the reservation
statusENUMReservation status (pending, confirmed, cancelled)
creation_timeTIMESTAMPReservation creation timestamp

2. NoSQL Schema (Using Cassandra)

NoSQL databases like Cassandra are suitable for handling Airbnb's high-volume, time-series data such as reviews, user activities, and property images. Cassandra's distributed architecture ensures high availability and scalability, making it ideal for real-time data processing.

Reviews Table

CREATE TABLE reviews ( review_id UUID PRIMARY KEY, listing_id BIGINT, guest_id BIGINT, host_id BIGINT, rating INT, comment TEXT, review_time TIMESTAMP );

UserActivity Table

CREATE TABLE user_activity ( user_id BIGINT, activity_time TIMESTAMP, activity_type ENUM, -- login, upload, booking, etc. details TEXT, PRIMARY KEY (user_id, activity_time) ) WITH CLUSTERING ORDER BY (activity_time DESC);

PropertyImages Table

CREATE TABLE property_images ( listing_id BIGINT, image_id UUID, image_url TEXT, upload_time TIMESTAMP, PRIMARY KEY (listing_id, image_id) );

Considerations

  • Normalization vs. Denormalization:
    • Relational Schema: Emphasizes normalization to reduce redundancy, ensuring data integrity for user-related information.
    • NoSQL Schema: Embraces denormalization to optimize read performance for high-volume data like user activities and property images.
  • Indexing:
    • Relational Databases: Indexes on primary and foreign keys to speed up joins and lookups.
    • NoSQL Databases: Secondary indexes on frequently queried fields (e.g., listing_id in Reviews) to enhance retrieval efficiency.
  • Scalability:
    • NoSQL Databases: Designed for horizontal scaling, allowing Airbnb to handle growing data volumes seamlessly.
  • Data Consistency:
    • Relational Databases: Ensure strong consistency for user and reservation data through ACID transactions.
    • NoSQL Databases: Utilize eventual consistency for high availability and performance in handling reviews and user activities.

Replication and Fault Tolerance

Ensuring data availability and resilience against failures is critical for Airbnb's continuous operation.

1. Relational Database (User, Listings, Reservations Services):

  • Replication Strategy: Master-Slave (Primary-Replica) setup.
    • Master: Handles all write operations.
    • Slaves: Handle read operations to distribute load.
  • Fault Tolerance: Automatic failover mechanisms to promote a slave to master in case of master failure.
  • Geographical Replication: Distribute replicas across multiple data centers to enhance availability and reduce latency.

2. NoSQL Database (Listings and Reservations Services):

  • Replication Strategy: Multi-Node Replication with data replicated across multiple nodes within the cluster.
  • Fault Tolerance: If a node fails, other replicas serve the data without interruption.
  • Consistency Model: Eventual consistency to balance availability and performance.

3. Analytics Service:

  • Replication Strategy: Data replicated across multiple nodes in the data warehouse to ensure high availability.
  • Fault Tolerance: Redundant data storage and processing nodes to prevent data loss and ensure continuous operation.
  • Data Backup: Regular backups to secure storage solutions to recover from potential data corruption or loss.

Conclusion

By strategically utilizing a microservices architecture with dedicated User, Listing, Booking, and Payment services, Airbnb can effectively manage its diverse and high-volume data requirements. The combination of relational and NoSQL databases ensures data integrity for user information and bookings while handling the scalability demands of property listings and real-time interactions. Implementing robust transactional integrity, availability calendar management, caching strategies, data security, and real-time event streaming guarantees high availability and performance, providing users with a seamless and reliable experience.

Incorporating the suggested diagrams will offer a clear visual representation of Airbnb's database architecture, aiding learners in understanding the complex interactions and design decisions involved in building a scalable and efficient system like Airbnb.

.....

.....

.....

Like the course? Get enrolled and start learning!
Previous
Next