Database Transactions & ACID Properties | Complete Guide to Data Consistency & Reliability

Transaction Process

A transaction can be defined as a group of tasks. A single task is the minimum processing unit which can't be divided further.

Or, In Other words we can say:

Transaction is an unit of work done against the DB in a logical sequence. Here, the sequence is very important in transaction.
It is a logical unit of work that contains one or more SQL statements. The result of all these statements in a transaction either gets completed successfully (all the changes made to the database are permanent) or if at any point any failure happens it gets rollbacked (all the changes being done are undone.)

Lets take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from A’s account to B’s account. This very simple and small transaction involves several low-level tasks.

A’s Account

Open_Account(A)
   Old = A.balance # Read A's Balance
   New = Old - 500 # Process A's Balance
   A.balance = New # Write Balance if Transaction Sucessfull
Close_Account(A)

B’s Account

Open_Account(B)
   Old = B.balance # Read B's Balance
   New = Old + 500 # Process B's Balance
   B.balance = New # Write Balance if Transaction Sucessfull
Close_Account(B)

ACID properties

A transaction is a very small unit of a program and it may contain several lowlevel tasks. A transaction in a database system must maintain Atomicity, Consistency, Isolation, and Durability − commonly known as ACID properties − in order to ensure accuracy, completeness, and data integrity.

Atomicity: Either all operations of transaction are reflected properly in the DB, or none are.
Consistency: Integrity constraints must be maintained before and after transaction. DB must be consistent after transaction happens.
Isolation: Even though multiple transactions may execute concurrently, the system guarantees that, for every pair of transactions Ti and Tj, it appears to Ti that either Tj finished execution before Ti started, or Tj started execution after Ti finished. Thus, each transaction is unaware of other transactions executing concurrently in the system. Multiple transactions can happen in the system in isolation, without interfering each other.
Durability: After transaction completes successfully, the changes it has made to the database persist, even if there are system failures.

Transaction states

Active State: The very first state of the life cycle of the transaction, all the read and write operations are being performed. If they execute without any error the T comes to Partially committed state. Although if any error occurs then it leads to a Failed state.
Partially committed state: After transaction is executed the changes are saved in the buffer in the main memory. If the changes made are permanent on the DB then the state will transfer to the committed state and if there is any failure, the T will go to Failed state.
Committed state: When updates are made permanent on the DB. Then the T is said to be in the committed state. Rollback can’t be done from the committed states. New consistent state is achieved at this stage.
Failed state: When T is being executed and some failure occurs. Due to this it is impossible to continue the execution of the T
Aborted state: When T reaches the failed state, all the changes made in the buffer are reversed. After that the T rollback completely. T reaches abort state after rollback. DB’s state prior to the T is achieved.
Terminated state: A transaction is said to have terminated if has either committed or aborted.

Concurrency Control

Definition: Concurrency control in DBMS ensures that when multiple transactions execute simultaneously, the consistency and isolation of the database are maintained. It prevents conflicts like lost updates, temporary inconsistency, and uncommitted data access.

Purpose: To coordinate concurrent transactions so that the result is equivalent to executing them serially, i.e., one after another.

Common Problems in Concurrent Transactions:

Problem	Description
Lost Update	Two transactions overwrite each other’s changes.
Temporary Inconsistency	One transaction reads uncommitted changes from another.
Unrepeatable Read	A value read twice by a transaction differs because another transaction modified it in between.
Deadlock	Two or more transactions wait indefinitely for each other to release locks.

Techniques for Concurrency Control:

Lock-based Protocols: Use locks to control access to data items.
Timestamp-based Protocols: Assign timestamps to transactions for ordering.
Optimistic Concurrency Control: Transactions execute freely and check for conflicts at commit time.
Validation-based Schemes: Ensure data consistency through validation before commit.

Serializability of Scheduling

Definition: A schedule is the sequence in which transactions are executed. A schedule is said to be serializable if its outcome is the same as that of some serial execution of those transactions.

Types of Serializability:

Conflict Serializability: If a non-serial schedule can be transformed into a serial one by swapping non-conflicting operations, it is conflict serializable.
- Two operations conflict if they belong to different transactions, act on the same data item, and one of them is a write.
View Serializability: If a schedule preserves the read and write dependencies of transactions in the same way as a serial schedule, it is view serializable.

Importance: Serializability ensures isolation in ACID properties, maintaining the logical correctness of concurrent transactions.

Locking and Timestamp-based Schedulers

(a) Lock-based Schedulers

These control access to data items using locks:

Shared Lock (S): For read operations; multiple transactions can share it.
Exclusive Lock (X): For write operations; no other lock is allowed.

Common Locking Protocols:

Two-Phase Locking (2PL):
- Growing phase: Transaction acquires locks.
- Shrinking phase: Once a lock is released, no new lock can be acquired. Ensures conflict serializability but may cause deadlocks.
Strict 2PL: Locks are released only after commit or abort, preventing cascading rollbacks.

(b) Timestamp-based Schedulers

Each transaction is assigned a unique timestamp when it starts.

Each data item maintains Read_TS(X) and Write_TS(X).
A transaction’s operations are executed in timestamp order.

Rules:

If a transaction tries to read a value written by a later transaction, it is rolled back (to maintain order).
If a transaction tries to write after a later read/write, it is rolled back.

Advantages:

Deadlock-free.
Ensures serializability based on timestamps.

Database Recovery

Definition: Database recovery is the process of restoring the database to a consistent state after a failure. Failures can be due to system crash, transaction error, or disk damage.

Types of Failures:

Transaction Failure – Logical or input errors within a transaction.
System Crash – Power failure or OS crash.
Disk Failure – Physical damage to storage media.

Recovery Techniques:

1. Shadow Copy Scheme:

A db-pointer points to the current database copy.
Before updating, a new copy (shadow copy) is created.
If the transaction succeeds, the db-pointer is updated to the new copy.
If it fails, the new copy is discarded.
Ensures Atomicity and Durability, but is inefficient for large databases.

2. Log-based Recovery:

Every transaction operation is logged before execution in stable storage.
Each log record contains transaction ID, data item, old value, and new value.

Types:

Deferred Update (Deferred DB Modification): All updates are recorded in the log first and written to the database only after a transaction commits.
Immediate Update (Immediate DB Modification): Updates occur as the transaction executes, but old values are logged to support undo if failure occurs.

Recovery Actions:

Situation	Action
Transaction fails before commit	UNDO using old values in the log.
System crashes after commit	REDO using new values in the log.

These mechanisms ensure the Atomicity and Durability components of the ACID properties.