What are the options for storing hierarchical data in a relational database?

Storing hierarchical or tree-like data in a relational database can be challenging because traditional RDBMS structures (tables, rows, columns) are optimized for tabular relationships, not hierarchical ones. However, developers commonly rely on a few well-known patterns to model hierarchical data effectively. Let’s explore the most popular techniques, discuss their pros and cons, and provide some real-world examples.

1. Adjacency List

Overview

The Adjacency List model is the simplest and most intuitive approach. Each row in a table has a column (often called parent_id) that points to its parent row, forming a parent-child relationship.

Schema Example

CREATE TABLE Categories (
    category_id INT PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    parent_id INT NULL,
    FOREIGN KEY (parent_id) REFERENCES Categories (category_id)
);

category_id: Unique identifier for each node.
parent_id: Points to the category_id of the parent node. If parent_id is NULL, the node is a root.

Pros

Simplicity: Easy to understand and implement.
Flexibility: Great for situations where hierarchy can change frequently (insertions, deletions, re-parenting).

Cons

Traversals: Finding all descendants or ancestors of a given node often involves multiple queries or self-joins.
Complex Queries: Depth-based queries can be cumbersome since you must recursively traverse the tree.

Use Cases

Category trees, organizational charts, or any scenario where frequent updates are common and queries are primarily shallow (e.g., retrieving direct children).

2. Materialized Path

Overview

The Materialized Path model stores a “path” (often a delimited string) that represents each node’s lineage from the root to itself.

Schema Example

CREATE TABLE Categories (
    category_id INT PRIMARY KEY,
    name VARCHAR(255),
    path VARCHAR(255) NOT NULL  -- e.g., '/1/5/12/'
);

path: Something like '/1/5/12/' indicating that this node (12) is a child of node 5, which is a child of node 1, the root.

Pros

Easy Descendant Queries: Searching for descendants is straightforward using a LIKE pattern match:
```
SELECT * FROM Categories
WHERE path LIKE '/1/5/%';
```
Simplicity: Easier to implement than nested sets and often more intuitive.

Cons

String Manipulation: Updating the path requires updating all descendants when a node changes parents.
Indexing: LIKE queries on paths can become costly unless carefully indexed (e.g., prefix indexing).
Space Overhead: The path string grows with tree depth.

Use Cases

Ideal if your tree is read-heavy, the hierarchy doesn’t change frequently, and you want quick ways to get descendants or do path-based queries.

3. Nested Set

Overview

The Nested Set model uses two numeric columns—commonly called lft (left) and rgt (right)—to represent a node’s position in a conceptual containment hierarchy.

Schema Example

CREATE TABLE Categories (
    category_id INT PRIMARY KEY,
    name VARCHAR(255),
    lft INT NOT NULL,
    rgt INT NOT NULL
);

Each node has a lft and a rgt value. A node is contained by another if its lft and rgt values lie strictly within the parent’s lft and rgt.

Pros

Fast Ancestry/Descendant Queries: One query can retrieve all descendants:

SELECT * FROM Categories
WHERE lft > parent.lft AND rgt < parent.rgt;

Deterministic Structure: Great for hierarchical reports or summations.

Cons

Complex Updates: Inserting or moving nodes can be expensive, as many rows must be updated when lft and rgt values shift.
Implementation Complexity: Harder to grasp and maintain compared to adjacency lists or materialized paths.

Use Cases

Highly read-oriented hierarchies, such as product categories or organizational charts that don’t change often.
Advanced queries (e.g., aggregate calculations up and down the hierarchy).

4. Closure Table (Recursive Join Table)

Overview

The Closure Table pattern stores a separate table that explicitly tracks every ancestor-descendant relationship (i.e., it “closes” the transitive relationship between nodes).

Schema Example

You’d have:

Main Table: Categories
Closure Table: CategoryPaths with columns (ancestor_id, descendant_id, depth)

CREATE TABLE Categories (
    category_id INT PRIMARY KEY,
    name VARCHAR(255) NOT NULL
);

CREATE TABLE CategoryPaths (
    ancestor_id INT NOT NULL,
    descendant_id INT NOT NULL,
    depth INT NOT NULL, -- distance from ancestor to descendant
    PRIMARY KEY (ancestor_id, descendant_id),
    FOREIGN KEY (ancestor_id) REFERENCES Categories(category_id),
    FOREIGN KEY (descendant_id) REFERENCES Categories(category_id)
);

For each node, there are as many rows in CategoryPaths as ancestors it has (including itself if you store reflexive relations).

Pros

Efficient Queries: Retrieving ancestors or descendants is straightforward with a simple join on CategoryPaths.
Flexible Updates: Re-parenting requires adjusting the closure table but can still be simpler than updating an entire nested set.

Cons

Extra Storage: Potentially large table if the hierarchy is deep or wide.
Complexity: Must handle the logic for inserting/updating/deleting closure rows carefully.

Use Cases

Complex hierarchies needing fast read queries for any level (ancestors, descendants, siblings).
Systems where denormalized relationships speed up queries significantly (e.g., user group permissions, advanced category structures).

5. Using Recursive CTEs (SQL Server, PostgreSQL, etc.)

Many modern SQL dialects (SQL Server, PostgreSQL, Oracle, etc.) support Recursive Common Table Expressions (CTEs), which can help you traverse hierarchical data, typically modeled as an Adjacency List. Although this is not a storage pattern itself, it’s useful for querying:

WITH RecursiveCTE AS (
    SELECT category_id, name, parent_id, 0 AS level
    FROM Categories
    WHERE parent_id IS NULL -- root nodes

    UNION ALL

    SELECT c.category_id, c.name, c.parent_id, r.level + 1
    FROM Categories c
    INNER JOIN RecursiveCTE r ON c.parent_id = r.category_id
)
SELECT * FROM RecursiveCTE;

This approach helps you retrieve hierarchical data without resorting to complicated user-defined functions or cursors.

6. Choosing the Right Model

Frequent Updates?
- Adjacency List or Closure Table might be better. Nested Set can be costly for updates. Materialized Path can also be expensive if you move nodes often.
Heavy Read-Only Queries?
- Nested Set or Materialized Path if re-parenting is rare. Closure Table works well too, but it may store more data.
Complex Queries (Aggregates, Multi-Level Traversals)?
- Nested Set excels for large read queries. Closure Table also provides straightforward ancestor/descendant lookups.
Simplicity & Maintainability?
- Adjacency List is simplest to implement but might require multiple queries or recursion for deeper traversals.
- Materialized Path is slightly more advanced but still relatively straightforward.
Querying Support:
- Many RDBMSs now provide recursive CTEs, making Adjacency List more powerful for hierarchical queries than it once was.

Conclusion

No single approach is universally “best.” Your choice depends on how often the hierarchy changes, how frequently (and deeply) you need to query up or down the tree, and what your performance requirements are. Below is a quick summary:

Adjacency List: Easiest to understand, best for frequent updates, but can complicate deep traversals.
Materialized Path: Straightforward for read operations on descendants, but expensive to update paths for re-parenting.
Nested Set: Excellent for read-heavy, stable hierarchies; complex for updates.
Closure Table: Faster reads for complex ancestor/descendant queries, but requires more storage and update logic.

What are the options for storing hierarchical data in a relational database?

1. Adjacency List

Overview

Schema Example

Pros

Cons

Use Cases

2. Materialized Path

Overview

Schema Example

Pros

Cons

Use Cases

3. Nested Set

Overview

Schema Example

Pros

Cons

Use Cases

4. Closure Table (Recursive Join Table)

Overview

Schema Example

Pros

Cons

Use Cases

5. Using Recursive CTEs (SQL Server, PostgreSQL, etc.)

6. Choosing the Right Model

Conclusion

Further Reading