Solution to Problem Set 7


Problem 1. 

   (i) In the case of scheme A, each query has to be sent to each of the
   5 computers.  The cost of a query is 2 disk IOs at each computer, for
   a total of 10 IOs.  The entire server has a capacity of 100 IOs/sec, so
   the query throughput is 10 queries/sec.

   In the case of scheme B, the cost of the query is 2 disk
   IOs (one each at the two computers) and 2 network message. To compute
   the overall query throughput of the system, we look at its IO and
   network capacities.  It has 100 IOs/sec and 100 messages/sec capacities.
   Thus, with respect to the IO capacity, the system can support 50
   queries/sec, and with respect to the network capacity the system can
   support 100/2 = 50 queries/sec. Assuming that the system can perform
   disk IOs and network transfer in parallel, the overall throughput of
   the system is 50 queries/sec.

  ii) In order to alter the scenario such that scheme A has better
  throughput than scheme B, we note that scheme A is independent of
  network bandwidth while scheme B can suffer if the network bandwidth is
  poor.  Consider a network bandwidth of 10 messages/sec, while all other
  parameters remain the same.  The query throughput of scheme A remains
  the same (10 queries/sec), while that of scheme B becomes 5.  Thus,
  scheme A can be better than scheme B.

Problem 2.

  (i)  Action					Time
  ---------------------------------------------------------
	Send R to site 5                         20000
	Send S to site 5		         30000
	Send T to site 5		         40000
	Send V to site 5		         50000
	Compute R NJ S                           60000
	Compute (R NJ S) NJ T                     8000
	Compute (R NJ S NJ T) NJ V		  1000
  ----------------------------------------------------------
  Execution time =                              209000

  (ii) Action					Time
  ----------------------------------------------------------
	Send R to size 2			20000
	Compute R NJ S				60000
	Send result to site 3			 4000
	Compute R NJ S NJ T			 8000
	Send result to site 4			  600
	Compute R NJ S NJ T NJ V		 1000
	Send result to site 5			   80
  ----------------------------------------------------------
  Execution time =                               93680
 (iii)  Action					Time
  ----------------------------------------------------------
	Send R to site 2		        20000
	Compute R NJ S			        60000
	Send result to site 5		         4000
	Send T to site 4			40000
	Compute T NJ V			       200000
	Send result to site 5			 8000
	Compute R NJ S NJ T NJ V		  800 
  ----------------------------------------------------------
  Execution time = 248800.  Note that the first three actions and the
  next three actions happen in parallel.

 (iv)  Action					  Time
  ---------------------------------------------------------
	Compute P(A,R)				    200
	Send P(A,R) to 2			   2000
	Compute S NJ P(A,R)			  60000
	Compute P(B,(S NJ P(A,R)))		     20
	Send P(B,(S NJ P(A,R))) to site 3	    200
	Compute T NJ P(B,(S NJ P(A,R)))		   8000
	Compute P(C,(T NJ P(B,(S NJ P(A,R)))))	      2
	Send P(C,(T NJ P(B,(S NJ P(A,R)))) to site 4	  20
	Compute V NJ P(C,(T NJ P(B,(S NJ P(A,R)))))  	1000
	Send V NJ P(C,(T NJ P(B,(S NJ P(A,R)))) to 5	  20
	Send R from 1 to 5			       20000*
	Send S NJ P(A,R) from 2 to 5			2000*
	Send T NJ P(B,(S NJ P(A,R))) from 3 to 5	 200*
	Compute the final join at site 5		   4
  ----------------------------------------------------------
  Execution time = 71466

  Note that the * costs are not counted because they can be accomplished in
  parallel with other tasks (#3, #6, and #9, for instance).  Also note that
  the estimate of the time for the final join is very fuzzy, but it should be
  small, in any case, compared with the rest of the major costs of the plan.

Problem 3.

a) There are two right answers:

 (i) If we consider all possible time-stamp assignments,
  we can find an assignment of timestamps for TO schedule,
  corresponding to any non-deadlocking 2PL schedule.

  Proof)

  Consider a schedule over transactions {T1, T2, ..., Tn}
  produced by a 2PL scheduler. Then there exists a TO scheduler
  that assigns a timestamp to T1,...,Tn as follows.
   
   ts(Ti) = time at which the transaction Ti starts releasing its 
            first lock in 2PL.

  Note that this timestamp assignment is "purely logical" in that
  it tries to capture the relative timestamps of T1,T2,...,Tn.

  Now consider any two conflicting actions Ai(x) and Aj(x)
  of transactions Ti and Tj. Let ts(Ti) < ts(Tj).
  So Ti starts releasing its locks before Tj does. 
  Just when Ti starts releasing its locks, it held the lock for x.
  So Tj cannot have held the lock for x at that point. This implies
  that Tj lock x after Ti releasd it. (Tj could not have locked x
  before Ti because ts(Ti) < ts(Tj).) So Ai(x) < Aj(x). 
  So the schedule follows TO rule.
 
 (ii) When we consider a paticular timestamp assignment,
   some schedule may be allowed in 2PL but not in TO.

   For example, when ts(T1) < ts(T2), the schedule 

    w2(x) w1(x)

   is possible in 2PL, but not allowed in TO.


b) Consider the schedule

    r1(x) w2(x) w3(y) r1(y)

    It is possible in timestamp ordering when ts(T3) < ts(T1) < ts(T2),
    but it cannot be produced by a 2 PL scheduler.
    This is because T1 has to release the lock on x (for T2 to write x),
    but T1 can only acqure lock on y (after T3 writes y).
    This violates 2PL.

Problem 4.

 a) i) Both N1 and N2 may have been in the following states: W, PC, PA, C
       To see why PA is possible, consider the following scenario:
       
    ii) We can commit T1, because N1 and N2 cannot have aborted.

 b) i) N1 and N2 may have been in the following states: W, PC, PA, C, A
    ii) We cannot proceed with T2, because N1 and N2 
        may have either aborted or committed.

Problem 5.

    votes for a = 2
    votes for b = 1
    votes for c = 1
    votes for d = 1

    Total votes = 5
    Majority = 3