Database Fundamentals

0% completed

Previous
Next
Isolation

What is Isolation?

Isolation ensures that the intermediate state of a transaction is invisible to other concurrent transactions. It prevents interference, ensuring that transactions execute as if they are happening sequentially, even in a multi-user environment.

Key Points:

  • Isolation defines how data accessed or modified by one transaction is visible to other concurrent transactions.
  • The level of isolation affects performance, consistency, and potential issues like dirty reads or phantom reads.

Transactions work independently, unaffected by others happening at the same time.

Isolation Problems and Examples

1. Dirty Reads

A dirty read occurs when one transaction reads uncommitted changes made by another transaction, which may later be rolled back. This leads to potential inconsistencies.

Example:

Image
  1. T1: Transaction A starts.
  2. T2: Transaction A deducts $100 from Account A (Balance temporarily becomes $400).
  3. T2: Transaction B starts.
  4. T3: Transaction B reads the balance of Account A as $400.
  5. T4: Transaction A rolls back, reverting the balance of Account A to $500.

Problem: Transaction B reads the uncommitted value of $400. Once Transaction A rolls back, the balance is restored to $500, but Transaction B has already processed the incorrect value, leading to dirty read inconsistency.

2. Dirty Writes

A dirty write happens when two concurrent transactions overwrite each other’s uncommitted changes, leading to data inconsistency.

Example:

Image
  1. T1: Transaction A begins.
  2. T2: Transaction A updates the balance by adding $100 (Balance becomes $600 temporarily).
  3. T2: Transaction B begins.
  4. T3: Transaction B updates the balance again by adding $100 (Balance becomes $700 temporarily).
  5. T4: Transaction A aborts, reverting its changes. The balance should have been $500 (initial state + changes from Transaction B), but now it is incorrectly recorded as $700 due to the uncommitted write from Transaction A.

Problem: Transaction B overwrites the changes made by Transaction A before Transaction A is committed. When Transaction A aborts, it leaves the database in an inconsistent state because Transaction B's update used uncommitted data.

3. Nonrepeatable Reads (Read Skew)

A nonrepeatable read occurs when a transaction reads the same row multiple times and gets different values due to modifications by another transaction.

Example:

Image
  1. T1: Transaction A begins.
  2. T1: Transaction B begins concurrently with Transaction A.
  3. T2: Transaction A reads the balance of Account A as $500.
  4. T3: Transaction B deducts $100 from Account A, updating the balance to $400, and commits.
  5. T4: Transaction A reads the balance of Account A again and gets $400.
  6. T5: Transaction A commits.

Problem Transaction A reads the same data twice (balance of Account A) but gets different values each time ($500 in T2 and $400 in T4). This inconsistency occurs because Transaction B committed its changes while Transaction A was still active, causing a non-repeatable read.

4. Phantom Reads

A phantom read happens when a transaction executes the same query multiple times and gets a different set of rows due to modifications by another transaction.

Example:

Image
  1. T1: Transaction A begins.
  2. T1: Transaction B begins concurrently with Transaction A.
  3. T2: Transaction A reads the number of rows in the table as 1.
  4. T3: Transaction B inserts a new row into the table and commits the transaction.
  5. T4: Transaction A reads the number of rows in the table again and now sees 2 rows.
  6. T5: Transaction A commits.

Problem: Transaction A reads the same query twice (number of rows in the table) but gets different results each time. In T2, it reads 1 row, but in T4, it reads 2 rows because Transaction B inserted a new row and committed its changes during Transaction A’s execution.

Isolation Levels in Databases

Isolation levels define the degree to which one transaction is isolated from other concurrent transactions. The higher the isolation level, the greater the guarantee of data consistency, but with a potential trade-off in performance.

Image

1. Read Uncommitted

  • Definition: The weakest isolation level, where transactions can read uncommitted changes made by other transactions.
  • Behavior:
    • Allows dirty reads.
    • Does not prevent dirty writes, nonrepeatable reads, or phantom reads.
  • Example:
    • Transaction A writes a value but has not committed yet.
    • Transaction B reads this uncommitted value.
    • If Transaction A rolls back, Transaction B ends up with inconsistent data.
  • Use Cases:
    • Rarely used due to data inconsistencies.
    • Suitable for scenarios where performance is critical and data integrity is not a concern.

2. Read Committed

  • Definition: Ensures that transactions can only read committed data. Uncommitted changes from other transactions are not visible.
  • Behavior:
    • Prevents dirty reads.
    • Allows nonrepeatable reads and phantom reads.
  • Mechanism:
    • The database keeps two copies of data:
      • An old, committed copy.
      • A new, uncommitted copy used by the ongoing transaction.
    • Reads always access the committed copy.
  • Example:
    • Transaction A writes a value but does not commit.
    • Transaction B tries to read the value and is blocked until Transaction A commits or rolls back.
  • Use Cases:
    • Default isolation level in most databases, such as PostgreSQL.
    • Balances consistency and performance.

3. Repeatable Read

  • Definition: Extends Read Committed by ensuring that if a transaction reads a row once, it will always read the same data for that row during the transaction, even if other transactions modify it.
  • Behavior:
    • Prevents dirty reads and nonrepeatable reads.
    • Allows phantom reads.
  • Mechanism:
    • Uses snapshot isolation:
      • Each transaction gets a consistent snapshot of the database at the start of the transaction.
      • Subsequent reads use this snapshot, ignoring changes made by other transactions.
  • Example:
    • Transaction A reads a row, and Transaction B modifies the same row.
    • If Transaction A rereads the row, it will still see the old value, even though Transaction B has committed.
  • Use Cases:
    • Default isolation level in MySQL.
    • Suitable for scenarios where consistent reads are important.

4. Serializable

  • Definition: The strictest isolation level, where transactions are executed in a way that produces results as if they were executed sequentially.
  • Behavior:
    • Prevents dirty reads, nonrepeatable reads, and phantom reads.
    • Transactions are fully isolated from each other.
  • Mechanism:
    • Achieved using either:
      • Two-Phase Locking (2PL): Locks resources until the transaction is completed.
      • Serializable Snapshot Isolation (SSI): Uses conflict detection to ensure serializable execution.
  • Example:
    • Two transactions that try to modify overlapping rows will be serialized, either by queuing one transaction or rolling it back.
  • Use Cases:
    • Suitable for financial applications or scenarios where correctness is more critical than performance.
    • Rarely used as the default due to performance overhead.

The Lost Update Problem

The Lost Update Problem occurs when two or more transactions modify the same data concurrently, and the updates of one transaction are overwritten by another, leading to data loss.

Example:

Image
  1. T1: Transaction A starts and reads a value X = 5.
  2. T2: Transaction B starts and also reads the same value X = 1.
  3. T3: Transaction A increments X by 1 and writes X = 6.
  4. T4: Transaction B increments X by 1 and writes X = 6.
  • Problem: The final value of X should be 7, but because Transaction B overwrites the value updated by Transaction A, the final value is incorrectly recorded as 6.

How to Avoid Lost Updates:

  • Use locking mechanisms to ensure that only one transaction can access and modify the data at a time.
  • Use atomic operations, like Compare-And-Swap (CAS), to ensure updates are made safely.

CAS (Compare-And-Swap)

CAS (Compare-And-Swap) is an atomic operation used to address the Lost Update Problem by ensuring that updates are applied only if the original value matches the expected value.

How CAS Works:

  1. A transaction reads a value and stores it as the expected value.
  2. Before updating the value, the database checks if the current value matches the expected value.
  3. If they match, the update is applied.
  4. If they don’t match, the update is aborted, and the transaction retries.

Example:

  1. Transaction A reads X = 5 and intends to update it to X = 6.
  2. Transaction B reads the same X = 5 and intends to update it to X = 7.
  3. Both transactions use CAS:
    • Transaction A succeeds because the current value matches the expected value (X = 5).
    • Transaction B fails because the current value has changed (X = 6), so it retries with the updated value.

Advantages of CAS:

  • Eliminates the need for explicit locks, improving performance.
  • Reduces the risk of deadlocks in high-concurrency scenarios.

Use Cases:

  • Used in databases and systems requiring high concurrency, like Redis and MySQL.
  • Common in operations like incrementing counters or modifying shared variables.

.....

.....

.....

Like the course? Get enrolled and start learning!
Previous
Next