The Transaction Concept: Virtues and Limitations
Jim Gray
One line summary:
This papers presents an overview of the general
properties of a transaction. It also talks about some problems which need to be
solved. (in 1981)
What is a transaction?
- A transaction is "a collection of actions which comprise a consistent
transformation of the state. It satisfies the system consistency constraints
to transform the system from one consistent state to another.
- System consistency constraints: assertions about the values of records and
the transformations allowed.
- Properties: (ACID)
- Atomicity: the transaction either happens or it does not. No "leaving in
the middle: is allowed.
- Consistency: obey legal protocols.
- Isolated: (not in the paper but I think was mentioned by Jim Gray's 262
lecture) transactions can execute concurrently, but it must appears to each
successful transaction that it has executed in a serial schedule with the
others.
- Durability: once a transaction is committed, it cannot be back out.
Committed: point of no going back.
Aborted: all of the effects
of a transaction need to be back out.
Relaxed these constraints so that
some actions can be ignored:
- unprotected: the action need not be undone or redone if the transaction
must be aborted. e.g. actions on scratch spaces.
- protected: the action can and must be undone or redone if the
transaction must be aborted. e.g. conventional database operations, insert,
delete.
- real: this action cannot be undone once done. e.g. committed actions.
NonStop:
How to get atomicity and durability? Build a perfect system!
But only need an "almost perfect" system:
- A system that fails once every many years is acceptable.
- Human errors can cause failures, and this cannot be prevented by a perfect
system.
How to build a very reliable system?
- Use many many unreliable parts to provide redundancy on a grand scale. For
example, each process in NonStop has a backup process which can continue to
work in case the first one died.
Update in place:
According to accounting principles, update-in-place is
bad because there is no history on the changes made.
Solutions:
Time-domain Addressing: (also called version oriented system)
Each
objects is not updated but "evolved" to have some additional info. Object
addresses become , so that it is possible to retrieve the state of an
object at any time in the past.
- Locking issues: Writing the header of the object implicitly lock the
object from others.
- Problems: Every read becomes a write because needs to update the time
stamp. Waits are aborts. Timestamps forces a single granularity. Cannot have
shared read access.
Locking and logging:
Allows undo and redo of
actions by keeping an entry for each action in a log. The log should be kept in
stable storage to avoid losing log entries in case of a system crash.
Some
terminology:
- Restartability: handles failures during undo and redo phases.
- Repeatability: to ensure a transaction reads the same value repeatedly ,
i.e., it cannot read some uncommitted values of another transaction. To
accomplish this, use exclusive write locks and shared read locks.
- Two phase commit protocol: A coordinator tells the nodes to commit, if a
node said "yes, I will commit", it cannot say no later. The transaction
commits if all nodes agreed to commit. Otherwise the transaction aborts. This
is used for distributed transactions.
Implementations of locking:
- Predicate locking: a predicate can covers the exact type of records that a
transaction wants to lock. This is expensive.
- Hierarchical locking; pick a fixed set of predicates and group them into
an acyclic graph and lock from root to leaf. Loses generality.
Limitations:
- Nested transactions: E.g. travel agent booking you a vacation. The whole
process is a transaction to the customer. Each action (book tickets, car
rental) is also a transaction to the agent, which can be aborted if for
example, no mid-size cars are available.
- Long-lived transactions: Frequency of deadlock goes up with the square of
the multiprogramming level, and fourth power of the transaction size. Partial
soln: allows only "active" transaction to hold locks.
- Integration with programming languages
I think these limitations
have been investigated by various research groups.