Hash-Based Indexing

Database Fundamentals

0% completed

Hash-Based Indexing

Hash-based indexing is a technique used to store and retrieve records efficiently by mapping keys to specific locations using a hash function. Unlike tree-based indexing, which organizes data hierarchically, hash-based indexing directly maps keys to buckets or slots, making it particularly efficient for point queries.

What Is Hash-Based Indexing?

Hash-based indexing employs a hash function to compute a hash value for each key, determining the bucket or slot where the corresponding record will be stored. This eliminates the need for sequential searching, offering constant-time complexity $O(1)$ in ideal cases.

Key Features

Fast and efficient for equality searches (e.g., "Find the record where ID = 102").
The hash function ensures even distribution of records across buckets.
Not suitable for range queries (e.g., finding all records with IDs between 100 and 200).

Static Hashing

In static hashing, the number of buckets is fixed when the hash table is created. Each record is mapped to a bucket based on its hash value. This structure works well when the size of the dataset is known and remains relatively constant.

How Static Hashing Works

Consider the image below.

Static Hashing

Data Mapping:
- Consider the keys 98, 106, 104, and 102.
- The hash function used is $h(key) = key \mod 5$ . The result of the hash function determines the bucket index.
Hash Calculation:
- $98 \mod 5 = 3$ → Bucket 3
- $106 \mod 5 = 1$ → Bucket 1
- $104 \mod 5 = 4$ → Bucket 4
- $102 \mod 5 = 2$ → Bucket 2
Bucket Representation:
- Each bucket stores the actual data corresponding to the hashed key. For example:
  - Bucket 3 contains data for key 98 (e.g., John, Delhi).
  - Bucket 1 contains data for key 106.
  - Bucket 4 contains data for key 104.
  - Bucket 2 contains data for key 102.

This setup ensures efficient lookups because the hash function directly maps keys to their corresponding buckets.

Problem: What Happens When a Collision Occurs?

Now, suppose we want to add the key 103 to the hash table:

$103 \mod 5 = 3$
Bucket 3 is already occupied by key 98, leading to a collision.

Solution: Chaining

Chaining is a common method to resolve collisions. When two or more keys map to the same bucket, a linked list (or chain) is created to store multiple records in that bucket.

Chaining Example

Consider the image below.

Solving the Collision in Hashing

Collision Handling:
- Keys 98 and 103 both map to bucket 3. Instead of overwriting data, a linked list is created within bucket 3 to store both records.
How It Works:
- The hash table is updated to include a chain for bucket 3.
- To search for a key (e.g., 103), the hash function maps it to bucket 3, and the linked list is traversed to locate the specific record.

Chaining ensures that no data is lost due to collisions, and the hash table remains efficient.

Dynamic Hashing

Dynamic hashing, also known as extendible hashing, overcomes the limitations of static hashing by allowing the number of buckets to grow or shrink dynamically. This ensures better handling of collisions and adapts to changes in the dataset size.

How Dynamic Hashing Works

Consider the image below.

Dynamic Hashing

Global Depth:
- The global depth k represents the number of bits used from the hash value to determine the directory ID. For example, if ( k = 2 ), the possible directory IDs are 00, 01, 10, and 11.
Mapping Keys to Buckets:
- A hash function generates a binary hash value for each key.
- The last ( k ) bits of the hash value determine the directory ID.
- The directory ID points to the corresponding bucket where the record is stored.
Handling Bucket Overflow:
- If a bucket overflows (e.g., due to multiple keys hashing to the same bucket), bucket splitting occurs:
  - The overflowing bucket is split into two new buckets.
  - The global depth k increases, and the directory is updated to reflect the new buckets.
- For example, if ( B_2 ) overflows, the directory IDs are updated to 100 and 101, and the bucket splits to redistribute the records.
- Keys are redistributed between the new buckets based on their updated hash values.

Properties of Dynamic Hashing

Bucket Splitting: When a bucket overflows, it is split into two, and the keys are redistributed.
Dynamic Directory: The directory grows or shrinks as needed, ensuring efficient data storage and retrieval.
Flexibility: The hash function dynamically adjusts to prevent collisions and ensure balanced bucket utilization.

Dynamic hashing provides a scalable and efficient solution for managing large and dynamic datasets.

Comparison of Static and Dynamic Hashing

Feature	Static Hashing	Dynamic Hashing
Number of Buckets	Fixed	Grows or shrinks dynamically
Collision Handling	Chaining or open addressing	Bucket splitting with updated directories
Flexibility	Limited	Highly flexible
Use Cases	Small, stable datasets	Large, dynamic datasets

.....

Like the course? Get enrolled and start learning!