CS145 Lecture Notes (9) -- Transactions



Transactions are motivated by two of the properties of DBMS's discussed way back in our first lecture:

Multi-user database access

Most database systems run as servers where either:
  1. multiple clients are simultaneously operating on the same database, or
  2. one or more middle-tier application servers are maintaining multiple concurrent connections to the database
(pictures)


















Problems created by concurrency => need concurrency control


Example 1: Attribute-level inconsistency

  Client 1 - UPDATE Student 
             SET address = CONCAT(address,:zip) WHERE ID = 123

  Client 2 - UPDATE Student
             SET address = CONCAT(address,:phone) WHERE ID = 123






For each client, DBMS reads address value, updates it, and writes it back. Possible outcomes without concurrency control: one change or both.

Example 2: Relation-level inconsistency

  Client 1 - UPDATE Apply SET decision = 'Y'
             WHERE location = 'SB'
             AND ID IN (SELECT ID FROM Student WHERE GPA > 3.2)

  Client 2 - UPDATE Student
             SET GPA = 1.2 * GPA
             WHERE HSname = 'Paly'

Possible outcomes without concurrency control: some Paly students get into SB on scaled GPA, others don't.


Example 3: Multiple-statement level inconsistency

  Client 1 - INSERT INTO Archive (SELECT * FROM Apply WHERE decision = 'N')
             DELETE FROM Apply WHERE decision = 'N'

  Client 2 - SELECT COUNT(*) FROM Apply
             SELECT COUNT(*) FROM Archive







Overall goal: Question: Why not just execute everything in sequence? What system, database, or application features give us inherent concurrency that we want to exploit?




Safe from system crashes

Need crash recovery


Solution: Transactions

A transaction is a sequence of one or more SQL operations treated as a unit. SQL standard (and Oracle): (picture)





Transaction properties

Transactions obey the ACID properties: Atomicity, Consistency, Isolation, Durability

(1) Isolation

(picture of multiple clients issuing streams of transactions)











Isolation obtained through serializability: operations within transactions may be interleaved but execution must be equivalent to some sequential (serial) order.

Question: How is this guarantee achieved?



Solves Examples 1,2,3 above

Example 4 (variant on 3):

  Client 1 - INSERT INTO Archive (SELECT * FROM Apply WHERE decision = 'N')
             DELETE FROM Apply WHERE decision = 'N'

  Client 2 - UPDATE Apply SET decision = 'U' WHERE Campus = 'Irvine'
Serialization order can make a big difference. This is the application's problem to solve, not the DBMS.

(2) Durability

If system crashes after transaction commits, all effects of transaction remain in database.

Question: Seems obvious, but all DBMS's manipulate the data in memory, so how is this guarantee achieved?



(3) Atomicity

Each transaction's operations are executed all-or-nothing, never left "half done." Question: How is this guarantee achieved?


"Transaction rollback" = "transaction abort" Robust application wraps every transaction with exception for system-initiated rollback.
  Client-initiated rollback:

    BEGIN TRANSACTION;
    <get input from user>
    SQL commands based on input
    <confirm results with user>
    IF input = confirm-OK THEN COMMIT; ELSE ROLLBACK;
Note: Rollback only undoes database changes, not other changes (e.g., program variables) or side-effects (e.g., printing to screen, delivering cash).

Question: No self-respecting database programmer would write the above transaction. Why?



(4) Consistency

Not really a property, more a good application of the other properties.

Idea: Assume all constraints are true at the start of every transaction. Clients are to guarantee, under this assumption and isolation, that all constraints are still true at the end of every transaction. (Similar to program invariants)

Read-only transactions

Can tell system a transaction will not perform writes, system will optimize accordingly.
  "SET TRANSACTION READ ONLY"
Many, many transactions and applications fall into this category.

Question: If there are five read-only transactions and no other transactions, what does the system need to do to guarantee serializability?




Weaker properties

There's a lot of overhead and concurrency reduction to guaranteeing the ACID properties. Sometimes full isolation (i.e., full serializability) is not required.

Three weaker isolation levels:

Note: An isolation level is in the eye of the beholder. Specifically, the reads performed by a transaction must adhere to its own isolation level.

Dirty reads

A data item is "dirty" if it has been written by an uncommitted transaction.

Example 5:

  Client 1 - BEGIN TRANSACTION;
             ...
             UPDATE Student SET GPA = .99 * GPA
             ...
             COMMIT;

  Client 2 - BEGIN TRANSACTION;
             ...
             SELECT AVG(GPA) FROM Student
             ...
             COMMIT;
Client 2 may only care about approximate average - dirty reads okay. Use:
  "SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED"
Note: Isolation level of Client 1 transaction is irrelevant unless another transaction updates GPA.

Committed reads

  "SET TRANSACTION ISOLATION LEVEL READ COMMITTED"

Modified Example 5:

  Client 1 - BEGIN TRANSACTION; // serializable
             ...
             UPDATE Student SET GPA = .99 * GPA
             ...
             COMMIT;


  Client 2 - SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
             BEGIN TRANSACTION;
             ...
             SELECT AVG(GPA) FROM Student  // executes before Client 1
             ...
             SELECT MAX(GPA) FROM Student  // executes after Client 1
             ...
             COMMIT;

Repeatable read

  "SET TRANSACTION ISOLATION LEVEL REPEATABLE READ"
Modified Example 5: get same GPAs both times

=> But still doesn't guarantee global serializability!

Example 6:

  Client 1 - BEGIN TRANSACTION; // serializable
             ...
             UPDATE Student SET GPA = .99 * GPA
             UPDATE Student SET SAT = 1.01 * SAT
             ...
             COMMIT;

  Client 2 - SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
             BEGIN TRANSACTION;
             ...
             SELECT AVG(GPA) FROM Student  // executes before Client 1
             ...
             SELECT AVG(SAT) FROM Student  // executes after Client 1
             ...
             COMMIT;
The following example is more realistic for repeatable read but not globally serializable, and is based on the fact that repeatable read does not apply to inserted tuples.

Example 7:

  Client 1 - BEGIN TRANSACTION; // serializable
             ...
             INSERT INTO Student <100 new students>
             ...
             COMMIT;

  Client 2 - SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
             BEGIN TRANSACTION;
            ...
             SELECT AVG(GPA) FROM Student  // executes before Client 1
             ...
             SELECT MAX(GPA) FROM Student  // executes after Client 1
             ...
             COMMIT;
New inserted tuples are called phantoms.


Question: What isolation level do you think Oracle supports as a default?







Summary

Standard default: transactions are serializable

Weaker isolation levels increase performance by eliminating overhead and increasing concurrency. From weakest to strongest and the read behaviors they permit:

isolation leveldirty readsnonrepeatable readsphantoms
read uncommitted Y Y Y
read committed N Y Y
repeatable read N N Y
serializable N N N

Remember that the isolation level is in the eye of the beholding transaction: For true global serializability, every transaction must have isolation level SERIALIZABLE.

=> Much, much more on transactions in CS245 and CS347