Logo

What are the options for storing hierarchical data in a relational database?

Storing hierarchical or tree-like data in a relational database can be challenging because traditional RDBMS structures (tables, rows, columns) are optimized for tabular relationships, not hierarchical ones. However, developers commonly rely on a few well-known patterns to model hierarchical data effectively. Let’s explore the most popular techniques, discuss their pros and cons, and provide some real-world examples.

1. Adjacency List

Overview

The Adjacency List model is the simplest and most intuitive approach. Each row in a table has a column (often called parent_id) that points to its parent row, forming a parent-child relationship.

Schema Example

CREATE TABLE Categories ( category_id INT PRIMARY KEY, name VARCHAR(255) NOT NULL, parent_id INT NULL, FOREIGN KEY (parent_id) REFERENCES Categories (category_id) );
  • category_id: Unique identifier for each node.
  • parent_id: Points to the category_id of the parent node. If parent_id is NULL, the node is a root.

Pros

  1. Simplicity: Easy to understand and implement.
  2. Flexibility: Great for situations where hierarchy can change frequently (insertions, deletions, re-parenting).

Cons

  1. Traversals: Finding all descendants or ancestors of a given node often involves multiple queries or self-joins.
  2. Complex Queries: Depth-based queries can be cumbersome since you must recursively traverse the tree.

Use Cases

  • Category trees, organizational charts, or any scenario where frequent updates are common and queries are primarily shallow (e.g., retrieving direct children).

2. Materialized Path

Overview

The Materialized Path model stores a “path” (often a delimited string) that represents each node’s lineage from the root to itself.

Schema Example

CREATE TABLE Categories ( category_id INT PRIMARY KEY, name VARCHAR(255), path VARCHAR(255) NOT NULL -- e.g., '/1/5/12/' );
  • path: Something like '/1/5/12/' indicating that this node (12) is a child of node 5, which is a child of node 1, the root.

Pros

  1. Easy Descendant Queries: Searching for descendants is straightforward using a LIKE pattern match:
    SELECT * FROM Categories WHERE path LIKE '/1/5/%';
  2. Simplicity: Easier to implement than nested sets and often more intuitive.

Cons

  1. String Manipulation: Updating the path requires updating all descendants when a node changes parents.
  2. Indexing: LIKE queries on paths can become costly unless carefully indexed (e.g., prefix indexing).
  3. Space Overhead: The path string grows with tree depth.

Use Cases

  • Ideal if your tree is read-heavy, the hierarchy doesn’t change frequently, and you want quick ways to get descendants or do path-based queries.

3. Nested Set

Overview

The Nested Set model uses two numeric columns—commonly called lft (left) and rgt (right)—to represent a node’s position in a conceptual containment hierarchy.

Schema Example

CREATE TABLE Categories ( category_id INT PRIMARY KEY, name VARCHAR(255), lft INT NOT NULL, rgt INT NOT NULL );
  • Each node has a lft and a rgt value. A node is contained by another if its lft and rgt values lie strictly within the parent’s lft and rgt.

Pros

  1. Fast Ancestry/Descendant Queries: One query can retrieve all descendants:
    SELECT * FROM Categories WHERE lft > parent.lft AND rgt < parent.rgt;
  2. Deterministic Structure: Great for hierarchical reports or summations.

Cons

  1. Complex Updates: Inserting or moving nodes can be expensive, as many rows must be updated when lft and rgt values shift.
  2. Implementation Complexity: Harder to grasp and maintain compared to adjacency lists or materialized paths.

Use Cases

  • Highly read-oriented hierarchies, such as product categories or organizational charts that don’t change often.
  • Advanced queries (e.g., aggregate calculations up and down the hierarchy).

4. Closure Table (Recursive Join Table)

Overview

The Closure Table pattern stores a separate table that explicitly tracks every ancestor-descendant relationship (i.e., it “closes” the transitive relationship between nodes).

Schema Example

You’d have:

  1. Main Table: Categories
  2. Closure Table: CategoryPaths with columns (ancestor_id, descendant_id, depth)
CREATE TABLE Categories ( category_id INT PRIMARY KEY, name VARCHAR(255) NOT NULL ); CREATE TABLE CategoryPaths ( ancestor_id INT NOT NULL, descendant_id INT NOT NULL, depth INT NOT NULL, -- distance from ancestor to descendant PRIMARY KEY (ancestor_id, descendant_id), FOREIGN KEY (ancestor_id) REFERENCES Categories(category_id), FOREIGN KEY (descendant_id) REFERENCES Categories(category_id) );
  • For each node, there are as many rows in CategoryPaths as ancestors it has (including itself if you store reflexive relations).

Pros

  1. Efficient Queries: Retrieving ancestors or descendants is straightforward with a simple join on CategoryPaths.
  2. Flexible Updates: Re-parenting requires adjusting the closure table but can still be simpler than updating an entire nested set.

Cons

  1. Extra Storage: Potentially large table if the hierarchy is deep or wide.
  2. Complexity: Must handle the logic for inserting/updating/deleting closure rows carefully.

Use Cases

  • Complex hierarchies needing fast read queries for any level (ancestors, descendants, siblings).
  • Systems where denormalized relationships speed up queries significantly (e.g., user group permissions, advanced category structures).

5. Using Recursive CTEs (SQL Server, PostgreSQL, etc.)

Many modern SQL dialects (SQL Server, PostgreSQL, Oracle, etc.) support Recursive Common Table Expressions (CTEs), which can help you traverse hierarchical data, typically modeled as an Adjacency List. Although this is not a storage pattern itself, it’s useful for querying:

WITH RecursiveCTE AS ( SELECT category_id, name, parent_id, 0 AS level FROM Categories WHERE parent_id IS NULL -- root nodes UNION ALL SELECT c.category_id, c.name, c.parent_id, r.level + 1 FROM Categories c INNER JOIN RecursiveCTE r ON c.parent_id = r.category_id ) SELECT * FROM RecursiveCTE;
  • This approach helps you retrieve hierarchical data without resorting to complicated user-defined functions or cursors.

6. Choosing the Right Model

  1. Frequent Updates?

    • Adjacency List or Closure Table might be better. Nested Set can be costly for updates. Materialized Path can also be expensive if you move nodes often.
  2. Heavy Read-Only Queries?

    • Nested Set or Materialized Path if re-parenting is rare. Closure Table works well too, but it may store more data.
  3. Complex Queries (Aggregates, Multi-Level Traversals)?

    • Nested Set excels for large read queries. Closure Table also provides straightforward ancestor/descendant lookups.
  4. Simplicity & Maintainability?

    • Adjacency List is simplest to implement but might require multiple queries or recursion for deeper traversals.
    • Materialized Path is slightly more advanced but still relatively straightforward.
  5. Querying Support:

    • Many RDBMSs now provide recursive CTEs, making Adjacency List more powerful for hierarchical queries than it once was.

Conclusion

No single approach is universally “best.” Your choice depends on how often the hierarchy changes, how frequently (and deeply) you need to query up or down the tree, and what your performance requirements are. Below is a quick summary:

  • Adjacency List: Easiest to understand, best for frequent updates, but can complicate deep traversals.
  • Materialized Path: Straightforward for read operations on descendants, but expensive to update paths for re-parenting.
  • Nested Set: Excellent for read-heavy, stable hierarchies; complex for updates.
  • Closure Table: Faster reads for complex ancestor/descendant queries, but requires more storage and update logic.

Further Reading

To enhance your understanding of relational design and best practices for performance, check out these resources from DesignGurus.io:

  1. Grokking Database Fundamentals for Tech Interviews

    • Learn about indexing, normalization, transactions, and more—key to building performant relational schemas.
  2. Relational Database Design and Modeling for Software Engineers

    • Dive deeper into schema design, advanced modeling techniques, and real-world scenarios—such as hierarchical data.
  3. Grokking System Design Fundamentals

    • Explore how relational and non-relational databases fit into larger systems, including distributed designs.

Mock Interviews and BootCamps

  • Mock Interviews: Practice with ex-FAANG engineers on data modeling, coding, and system design.
  • Interview BootCamp: A structured 12-week plan covering coding, system design, and more—perfect for tech interview prep.

By mastering these hierarchical data storage patterns, you can tailor your design to your application’s needs—whether it’s rapid updates, heavy read queries, or complex parent-child relationships.

CONTRIBUTOR
TechGrind