Solution to Problem Set 7
Problem 1.
(i) In the case of scheme A, each query has to be sent to each of the
5 computers. The cost of a query is 2 disk IOs at each computer, for
a total of 10 IOs. The entire server has a capacity of 100 IOs/sec, so
the query throughput is 10 queries/sec.
In the case of scheme B, the cost of the query is 2 disk
IOs (one each at the two computers) and 2 network message. To compute
the overall query throughput of the system, we look at its IO and
network capacities. It has 100 IOs/sec and 100 messages/sec capacities.
Thus, with respect to the IO capacity, the system can support 50
queries/sec, and with respect to the network capacity the system can
support 100/2 = 50 queries/sec. Assuming that the system can perform
disk IOs and network transfer in parallel, the overall throughput of
the system is 50 queries/sec.
ii) In order to alter the scenario such that scheme A has better
throughput than scheme B, we note that scheme A is independent of
network bandwidth while scheme B can suffer if the network bandwidth is
poor. Consider a network bandwidth of 10 messages/sec, while all other
parameters remain the same. The query throughput of scheme A remains
the same (10 queries/sec), while that of scheme B becomes 5. Thus,
scheme A can be better than scheme B.
Problem 2.
(i) Action Time
---------------------------------------------------------
Send R to site 5 20000
Send S to site 5 30000
Send T to site 5 40000
Send V to site 5 50000
Compute R NJ S 60000
Compute (R NJ S) NJ T 8000
Compute (R NJ S NJ T) NJ V 1000
----------------------------------------------------------
Execution time = 209000
(ii) Action Time
----------------------------------------------------------
Send R to size 2 20000
Compute R NJ S 60000
Send result to site 3 4000
Compute R NJ S NJ T 8000
Send result to site 4 600
Compute R NJ S NJ T NJ V 1000
Send result to site 5 80
----------------------------------------------------------
Execution time = 93680
(iii) Action Time
----------------------------------------------------------
Send R to site 2 20000
Compute R NJ S 60000
Send result to site 5 4000
Send T to site 4 40000
Compute T NJ V 200000
Send result to site 5 8000
Compute R NJ S NJ T NJ V 800
----------------------------------------------------------
Execution time = 248800. Note that the first three actions and the
next three actions happen in parallel.
(iv) Action Time
---------------------------------------------------------
Compute P(A,R) 200
Send P(A,R) to 2 2000
Compute S NJ P(A,R) 60000
Compute P(B,(S NJ P(A,R))) 20
Send P(B,(S NJ P(A,R))) to site 3 200
Compute T NJ P(B,(S NJ P(A,R))) 8000
Compute P(C,(T NJ P(B,(S NJ P(A,R))))) 2
Send P(C,(T NJ P(B,(S NJ P(A,R)))) to site 4 20
Compute V NJ P(C,(T NJ P(B,(S NJ P(A,R))))) 1000
Send V NJ P(C,(T NJ P(B,(S NJ P(A,R)))) to 5 20
Send R from 1 to 5 20000*
Send S NJ P(A,R) from 2 to 5 2000*
Send T NJ P(B,(S NJ P(A,R))) from 3 to 5 200*
Compute the final join at site 5 4
----------------------------------------------------------
Execution time = 71466
Note that the * costs are not counted because they can be accomplished in
parallel with other tasks (#3, #6, and #9, for instance). Also note that
the estimate of the time for the final join is very fuzzy, but it should be
small, in any case, compared with the rest of the major costs of the plan.
Problem 3.
a) There are two right answers:
(i) If we consider all possible time-stamp assignments,
we can find an assignment of timestamps for TO schedule,
corresponding to any non-deadlocking 2PL schedule.
Proof)
Consider a schedule over transactions {T1, T2, ..., Tn}
produced by a 2PL scheduler. Then there exists a TO scheduler
that assigns a timestamp to T1,...,Tn as follows.
ts(Ti) = time at which the transaction Ti starts releasing its
first lock in 2PL.
Note that this timestamp assignment is "purely logical" in that
it tries to capture the relative timestamps of T1,T2,...,Tn.
Now consider any two conflicting actions Ai(x) and Aj(x)
of transactions Ti and Tj. Let ts(Ti) < ts(Tj).
So Ti starts releasing its locks before Tj does.
Just when Ti starts releasing its locks, it held the lock for x.
So Tj cannot have held the lock for x at that point. This implies
that Tj lock x after Ti releasd it. (Tj could not have locked x
before Ti because ts(Ti) < ts(Tj).) So Ai(x) < Aj(x).
So the schedule follows TO rule.
(ii) When we consider a paticular timestamp assignment,
some schedule may be allowed in 2PL but not in TO.
For example, when ts(T1) < ts(T2), the schedule
w2(x) w1(x)
is possible in 2PL, but not allowed in TO.
b) Consider the schedule
r1(x) w2(x) w3(y) r1(y)
It is possible in timestamp ordering when ts(T3) < ts(T1) < ts(T2),
but it cannot be produced by a 2 PL scheduler.
This is because T1 has to release the lock on x (for T2 to write x),
but T1 can only acqure lock on y (after T3 writes y).
This violates 2PL.
Problem 4.
a) i) Both N1 and N2 may have been in the following states: W, PC, PA, C
To see why PA is possible, consider the following scenario:
ii) We can commit T1, because N1 and N2 cannot have aborted.
b) i) N1 and N2 may have been in the following states: W, PC, PA, C, A
ii) We cannot proceed with T2, because N1 and N2
may have either aborted or committed.
Problem 5.
votes for a = 2
votes for b = 1
votes for c = 1
votes for d = 1
Total votes = 5
Majority = 3