# CS145 Final Exam Solutions

### Index

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

### Problem 1: (d)

The fact that R is many-one and onto (i.e., each A is associated with exactly one B does not constrain B, except that it must have at least one entity. All the A's could map to 1 B or 100 different B's, or anything in between, and there could be any number of B's that are not associated with any A.

### Problem 2: (b)

The schema for R is (a,b,c), including the full key for B.

### Problem 3: (a)

Many many people got this wrong. The problem is that people confuse the idea of where the key comes from in a weak entity set with what are the entities themselves. For example, if B were ``crews'' as in the text, and A was movies, while R indicated which crew worked on which movie, a movie would be related to a crew, i.e., a group of people. The fact that we couldn't uniquely refer to that crew without knowing the studio (entity set C) they worked for is irrelevant; R still connects movies to crews and not to studios.

### Problem 4: (c)

The tuples are (1,2,5), (1,2,6), (3,4,NULL), and (NULL,7,8).

### Problem 5: (d)

The fourth step (the revoke) effectively cancels the second and third steps. However, the first step gave B exactly the privileges described in choice (d), so that correctly describes the situation after step 4.

### Problem 6: (d)

You get 2 tuples if T1 is either completely before or completely after T2. The interleaving 2a, 1a, 1b, 2b causes 1b to return only one tuple. The interleaving 1a, 2a, 2b, 1b causes 1b to return 3 tuples: (3,4) because it is always there, (5,6) because it was inserted already, and (1,2), because even though it is no longer there, 1a saw it so 1b must see it. Note that T1 has no effect on what T2 sees, so the fact that T2 is serializable doesn't restrict the order of steps in any way.

Added later: A number of people argued with me that it is impossible for T1 to see one tuple, because the deletion of (1,2) is not committed until after (5,6) is inserted. I sort of sympathize, but the problem is that the standard talks about tuples, as if they were the elementary units of information. Thus, when step 1a asks for committed tuples, it does not have to see (1,2). There is no notion of a ``committed absence of a tuple.''

In practice, DBMS's use disk blocks as the elementary unit of information, e.g., for locking. Here, the reasoning that ``the absence of (1,2) is uncommitted'' makes perfectly good sense. That is, after step 2a, the block containing tuple (1,2) has been updated --- rather than deleted --- and its new value is uncommitted. Therefore, at that time T1 could not see that block and must see tuple (1,2), probably on another version of the same disk block (unless T1 gets deferred --- a likely possibility). As a result, answer (b) was also accepted for credit.

### Problem 7: (d)

Actually, the instructor and most of the class had answer (b), that the update (I) succeeds in getting tuple (2) out of R, but the deletion doesn't.

The deletion (II) will surely be rejected, since it must cause at least one violation of the foreign-key constraint. Thus, it will make no change to either R or S, assuring that (2) remains in R.

However, the following reasoning was shown to me later by one of the students. Since S.b is referenced by a foreign key, it must be the primary key of S. Therefore, whenever S has more than one tuple, the update will violate this unseen primary-key declaration and will be rejected. Thus, in these cases, it will fail to rid R of (2).

### Problem 8: (c)

What can I say? The subquery produces those departments with an average salary of at least \$50K, and the assertion checks that the Toy Dept. is one of these departments.

### Problem 9: (a)

View V(d,c) consists of the tuples (2,3), (12,2), and (12,1) currently, although the view isn't materialized. Therefore, W(d,e) has the tuples (2,3) and (12,3). The query itself groups both tuples, yielding only (7,3).

### Problem 10: (c)

There is a ``standard'' trick for getting MAX or MIN into (core) relational algebra or Datalog, where you take the product of a relation with itself, and then do a selection that requires one copy of an attribute like salary to be strictly less than the second copy. A projection onto the first copy gives us those that are not maximum, and a set-difference lets us find those that are maximum.

### Problem 11: (d)

This was a very hard problem for many people, especially those who skipped the Tuesday review session and didn't see me do a similar problem involving inference of MVD's. The key is to verify that A->B does follow from A->->B and D->B. To prove A->B, start by assuming there are two tuples in R that agree in A. We may refer to these tuples as (a,b1,c1,d1) and (a,b2,c2,d2), since we know nothing about components other than A.

Because A->->B is given, we know that (a,b2,c1,d1) is also in R. Now, we can apply D->B to (a,b1,c1,d1) and (a,b2,c1,d1) to infer that b1=b2. That says that if any two tuples agree in A, they must also agree in B; i.e., A->B.

Now, C->A and A->B tell us C->B. Also, C->A and C->B tell us C->->AB, and the latter MVD tells us C->->D. Thus, all of (a), (b), and (c) are true, and the answer must be (d).

### Problem 12: (b)

The only thing the constraints will allow us to do is insert (4) into R or (0) into S. Choose the former for the first step. Then, we can add (3) to S, then (6) to R, (5) to S, and (10) to R, taking a total of 5 steps. Technically, we need to create a table, using a ``dynamic programming'' algorithm, in which we record for each tuple in R or S, the least number of steps needed to put it there, but this exploration doesn't go too far before you discover how to add (10) to R.

### Problem 13: (a)

Choice (a) replaces the first (and only, in this case) question-mark by the value 5. You need to do this step first, in order to have the update make sense when you execute it. Choice (b) makes no sense because there isn't a second question-mark, (c) is nonsense, and (d) is inappropriate because you aren't trying to execute a query, but rather an update.

### Problem 14: (a)

You need to do all of (b), (c), and (d), but observer methods only read values and do not create or insert them.

### Problem 15: (d)

Dots are appropriate here; -> is used only to follow a REF.

### Problem 16: (c)

If neither R.a nor R.b are NULL, then the expression is a tautology of 2-valued logic and must be true in either 2-valued or 3-valued logic. However, if either or both values are NULL, then the 3-valued truth value is at least unknown, and therefore cannot be false. There are, however, values that make this expression unknown, e.g., R.a = NULL and R.b = -10.

### Problem 17: (b)

The update replaces (1,2) by (1,3), and then the trigger applies and insert (1,4). Since this insert doesn't wake up the update-trigger, that's all that happens.

### Problem 18: (b)

A has to be in any key because it is not on the right of any FD. If we add E, the second FD immediately gives us all attributes, so AE is a key. To use the first FD, we must add BC, and then the first FD immediately gives us all attributes, so ABC is another key. However, there are no other keys, since any other key would have to enable us to use at least one of the FD's, and we already have considered the minimum sets needed to get one of their left sides.

Shame on people who are still confusing ``key'' with ``superkey.''

### Problem 19: (a)

E->D is a 3NF violation. That is, E is not a superkey, and D is not prime (in some key).

### Problem 20: (d)

The only way a set could not be closed is if it includes the left side of one of the FD's but not the right. The sets that include AB but not C are AB and ABD. The sets that include BC but not D are BC and ABC. That's it; all 12 of the other subsets of ABCD must be closed.

### Problem 21: (c)

In this case, with Answer(x,y) as the head in each case, a rule is safe only if both x and y appear in the body in nonnegated, relational subgoals. Only (c) qualifies.

### Problem 22: (d)

In (d), the selection equates the attributes of P, giving us exactly those tuples that match the subgoal P(x,x). The projection and renaming gives us those values of x, but renamed attribute C. Thus, the natural join gives us those tuples in Q whose first compenents appear ``doubled'' (i.e., a single value, twice) in P.

Choice (a) is incorrect; it never forces the tuple from P to be ``doubled.'' Choice (b) equates the first component of P to the second component of Q. Choice (c) fails to equate the components of P to the first component of Q.

### Problem 23: (c)

The join equates the second component of P to the first component of Q. If we think of P as P(x,y,z), as in each of the answer choices, then Q should be Q(y,w) to reflect the join. The selection then says that x<y, and the projection is onto x and w. Choice (c) fits all these requirements.

### Problem 24: (b)

Let us represent a model by (R,N), where R is the set for which Reach is true and N is the set for which NoReach is true. Reach is in stratum 0, and NoReach is in stratum 1. We compute the former using the first two rules: R = {a,c}. Then, using this value of R in the third rule, we have N = {d}.

### Problem 25: (b)

Any other model has to have a and c in R, because the EDB and the first two rules force these IDB facts to be true. Thus, we need to consider the following four cases: R = {a,c}, {a,b,c}, {a,c,d}, {a,b,c,d}.

In the first two cases, the third rule forces d to be in N. Other nodes could also be in N, but all these models are not minimal; they contain ({a,c}, {d}), which is the stratified model.

The third case gives us the minimal model ({a,c,d}, {}). In the fourth case, we get ({a,b,c,d}, {}), and models with some nodes in N, but none of these are minimal; they all contain ({a,c,d}, {}).

### Problem 26: (c)

Strange but true: the RETURN statement in PSM sets the return value but doesn't actually return.

### Problem 27: (a)

The rule for what attributes can appear unaggregated in a HAVING clause (outside any subqueries) is the same as for a SELECT clause: only those attributes that appear in the GROUP BY list. Attribute d is not one of those.

### Problem 28: (b)

As sets, both produce the intersection of R and S. However, SQL is a bag language. Suppose a tuple t appears m times in R and n times in S. Then T appears min(m,n) times in the answer to Q1. However, for the join, we pair all tuples of R with all tuples of S that agree in the common attributes (i.e., all attributes in this case), and we produce of copy for each successful pairing. Thus, Q2 produces mn copies of t. It is easy to verify that as long as m and n are nonnegative integers, min(m,n) <= mn.

### Problem 29: (a)

Each produces exactly the set of tuples (a,c,d), where c is the maximum b paired with a in R, and d is the minimum such b.

### Problem 30: (c)

If we think of Arc as arcs in a directed graph, then Q1 produces all (x,y) such that there is a path from node x to node y of length 1 or more. Q2 produces all and only the odd-length paths, as can be proved by an easy induction on the number of times the recursive rule is applied. Thus, surely Q2 is contained in Q1.

To check that these queries are not equal, consider a graph with only the arcs 1->2 and 2->3. Q1 produces (1,3), while Q2 doesn't.

### Problem 31: (d)

Q1 produces the names of the A's that are connected only to B's that are connected to the C-object ``Joe.'' Q2 looks at all the B's that are connected to C-object ``Joe'' and produces the names of their associated A-object.

Q2 will produce an A-object that is connected to some B's that are connected to ``Joe'' and to at least one B that is not connected to ``Joe.'' However, Q1 will not produce such an A. Thus, Q2 is not contained in Q1.

On the other hand, Q1 will produce an A that is connected to no B's at all, because the FOR ALL is always true when it is applied to an empty collection. However, Q2 will not produce such an A. Thus, neither query is contained in the other.

### Problem 32: (b)

Q1 produces the names of those B's that are related to some C and also related to the A named ``Joe.'' Note that since no B is related to more than one C, no B-object has its name produced more than once, but since a B might not be related to any C at all (see p. 141 of the text, or several examples in the notes), a B might not affect the output, even though it is related to the A named ``Joe.''

On the other hand, Q2 produces all the B's that are related to the A named ``Joe,'' regardless of whether those B's are related to any C. Thus, Q2 is properly contained in Q1.

### Problem 33: (c)

Q1 produces the a-components of those tuples that are in R but not S. Q2 produces those a's that are the first component of some tuple in R, but not of any tuple in S. Surely, if a is a first component of a tuple (a,b) of R but not the first component of any tuple of S at all, then (a,b) is not in S, and therefore a is in the result of Q1. It follows that Q2 is contained in Q1.

To check that the queries are not equal, let R = {(1,2)} and S = {(1,3)}. Then Q1 produces (1), but Q2 does not.

### Problem 34: (a)

Both queries produce those tuples of R whose b-components are equal to the d-component of some tuple of S.

### Problem 35: (b)

Q1 produces those a-components from R that are bigger than all b-components, while Q2 produces those a-components from R that are different from at least b-component. Our first instinct is that therefore Q1 is contained in Q2, since if a value is bigger than all elements of a set, then it is ``surely'' different from at least one of those elements.

However, if we remember a similar question from last year's final, we think ``what if the set is empty.'' Then a value can be bigger than all, but not different from any. But in this case, when R is empty, both queries produce nothing. Thus, here our first thought turns out to be correct, and Q1 really is contained in Q2.