## Solutions for CS145 Final

 ```Problem #1: Relational algebra expressions over R(a,b) and S(b,c) ```
 ```Correct: c) The answer to Q2 is always contained in the answer to Q1. Note that both Q1 and Q2 result in a subset of R. A tuple R(a,b) is selected by Q1 if there exists *some* tuple in S such that S.b = R.b. This is equivalent to the following SQL query: Q1: SELECT * FROM R WHERE R.b = ANY (SELECT S.b FROM S) A tuple R(a,b) is produced as output by Q2 if (there exists some tuple S(b,c) with S.c = R.a) and (there exists some tuple S(b,c) such that S.b = R.b). This is equivalent to the following SQL query: Q2: SELECT * FROM R WHERE ((R.b = ANY (SELECT S.b FROM S) AND (R.a = ANY (SELECT S.c FROM S)) The two SQL queries make it clear that Q2 produces a subset of tuples produced by Q1. ```

 ```Problem #2: SELECT ALL or ANY with subquery ```
 ```Correct: d) Q1 and Q2 produce different answers. There are two cases to consider: - Subquery returns empty set ALL (empty set) returns TRUE while ANY (empty set) returns FALSE. So Q1 produces all of R.a while Q2 returns empty bag. - Subquery returns non-empty set Usually, the condition b >= ALL (some_non_empty_set) will select fewer tuples than the condition b >= ANY (some_non_empty_set). ```

 ```Problem #3: SELECT w/ GROUP BY vs SELECT DISTINCT ```
 ```Correct: a) Q1 and Q2 produce the same answer. Q1 returns the _set_ (not a _bag_) of "a" values whose corresponding "b" value is positive. Q2 first groups selects tuples with positive "b" values, groups them by "a" values and then selects the "a" values of these groups. Note that the GROUP BY produces a _set_ of groups whose key is the set of attributes over which we created groups. ```

 ```Problem #4 Team/Player ODL Problem #1 ```
 ```Correct: a) Q1 and Q2 produce the same answer. Both of them print names of players who play for team with name "49ers". Note that there is a unique team with that name because Team has key "name". ```

 ```Problem #5: Team/Player ODL Problem #2 ```
 ```Correct: d) Q1 and Q2 produce different answers. The Types of the answer set of the two queries are different. Q1 produces output of type Bag(Struct(string)) whereas Q2 produces output of type Bag(Struct(Set(class Player))). ```

 ```Problem #6: Datalog Cousin/Sibling/GrandParent ```
 ```Correct: d) Q1 and Q2 produce different answers. We can rewrite Q1 by substituting Sibling into the second rule: Cousin(x,y) <- Par(x,xp) AND Par(y,yp) AND Par(xp,g) AND Par(yp,g) AND xp <> yp We can rewrite Q2 by substituting GrandParent into the second rule: Cousin(x,y) <- Par(x,xp) AND Par(xp,g) AND Par(y,yp) AND Par(yp,g) AND x <> y The two expressions are identical except for the last condition. Here is how the last inequality affects the answers: Consider Par() relation with the following tuples: { (x, a), (x, b), (a, g), (b, g), (u, c), (v, c), (c, h) } Then, Q1 produces Cousin(x, x) Q2 produces Cousin(u, v) ```

 ```Problem #7 Arc(x,y) Problem ```
 ```Correct: a) Q1 and Q2 produce the same answer. V(x, degree) is a _set_ of tuples where the second attribute "degree" is the out-degree of node "x" in the multi-graph (a graph with "parallel"/"multiple" edges). Let neighbour(x) denote the _bag_ of neighbours of x (i.e., there is one occurence of y in neighbour(x) for each occurence of Arc(x,y)). Q1 computes the sum of out-degrees of nodes in the _bag_ neighbour(0). Q2 computes the cross product of a1 with a2 and selects those tuples where (a1.x = 0 AND a1.y = a2.x). Finally, the count of these tuples is produced. Effectively, Q2 iterates over the _bag_ neighbour(0). Each node y in neighbour(0) contributes out-degree(y) to the final sum. ```

 ```Problem #8: COUNT vs COUNT DISTINCT ```
 ```Correct: d) Q1 and Q2 produce different answers. If R is the bag <(a,b,c), (a,b,c)>, Q1 produces 1 whereas Q2 produces 2. ```

 ```Problem #9: UPDATE with INSERT/SELECT ```
 ```Correct: d) Q1 and Q2 produce different answers. This question is slightly tricky. (a) If R has no tuple with a = 20, then Q1 produces an empty bag but Q2 produces a single tuple (10, 20). (b) If R has several tuples with a = 20, then Q1 produces a bag with as many occurences of (10, 20) as there were tuples with a = 20 earlier (Note that a is not the key for R. So there could easily be duplicates). However, Q2 produces only a single tuple (10, 20). ```

 ```Problem #10: SELECT with "a IS NULL" vs "a NOT LIKE '%'" ```
 ```Correct: c) The answer to Q2 is always contained in the answer to Q1 The '%' regular expression matches all NOT NULL strings (including the empty string with length zero). It is like "*" when using "ls *" in UNIX. Q2 always produces an empty bag because - "a NOT LIKE '%'" does not match any NOT NULL string R.a. - If R.a IS NULL, the outcome is UNKNOWN and the corresponding row doesnt get selected. Q1 produces all those rows where R.a IS NULL. ```

 ```Problem #11: R(a, b) and S(b, c) ```
 ```Correct: c) The answer to Q2 is always contained in the answer to Q1. Q1 natural-joins R and S and then selects the first column. The size of the answer is equal to the size of (R NATURAL JOIN S). Q2 selects those tuples in R whose b attribute exists among the bag of b attributes of S. The size of the answer is never more than the size of R itself. Example: R = {(1,2)} and S = {(2,3), (2,4), (2,5)}. Then, Q1 = <(1), (1), (1)> whereas Q2 = <(1)>. ```

 ```Problem #12: R(x, y, value) problem ```
 ```Correct: b) The answer to Q1 is always contained in the answer to Q2. Q1 is equivalent to: SELECT S.value FROM ((SELECT * FROM R) MINUS (SELECT R1.x, R1.y, R1.value FROM R R1, R R2 WHERE (R1.x = R2.x OR R1.y = R2.y) AND (R1.value < R2.value))) S ; Q2 is equivalent to: (SELECT MAX(R.value) FROM R GROUP BY R.x) UNION (SELECT MAX(R.value) FROM R GROUP BY R.y) ; Consider R = {(0,0,0), (0,2,200), (5,2,500)}. Then, Q1 = {(500)} whereas Q2 = {(0), (200), (500)}. Try them in Oracle. ```

 ```Problem #13: Relational algebra expressions ```
 ```Correct: c) The answer to Q2 is always contained in the answer to Q1. Q1 is the cross product of all values in R.a with all values in R.b. It is easy to see that Q2 cannot possibly produce a tuple that is not contained in Q1. What remains to be determined is whether there is some tuple in the cross product that Q2 cannot produce. There indeed are such tuples. For example, if R = {(5,5)}, Q1 = {(5,5)} whereas Q2 = {}. Another example is R = {(1,2), (2,3)} where Q1 = {(1,2),(1,3),(2,2),(2,3)} whereas Q2 = {(1,2), (2,3), (2,2)} ```

 ```Problem #14: XPath problem ```
 ```Correct: d) Q1 and Q2 produce different answers. Q1 allows tag patterns like that Q2 does not. Q2 allows that Q1 does not. ```

 ```Problem #15: Two SQL queries ```
 ```Correct: b) The answer to Q1 is always contained in the answer to Q2. In Q1, there is one tuple per unique value of attribute R.a (due to the GROUP BY). Each of these tuples is produced by Q2. However, neither a nor b nor (a,b) is a key for R. Therefore, there could be multiple rows that share a and b attributes. Thus, Q2 might produce the same row more than once. One such relation where this would occur is R = {(1, 2, 3), (1, 2, 4)}. ```

 ```Problem #16: E/R Diagram ```
 ```Correct: a) I only R has two FD's: ABC -> D and ABD -> C. I ABC -> D is violated. II Neither FD is violated. The tuples do not share ABC or ABD attributes. III Neither FD is violated. The tuples do not share ABC or ABD attributes. ```

 ```Problem #17: Keys of Projected Relation ```
 ```Correct: c) Only AB and AC are keys There are four FD's: A -> D, B -> C, D -> E and CE -> B. A is not on the right hand side of any FD. So A must be part of any key for R. This leaves us with four possible keys: A, AB, AC, and ABC. It is easy to see that: A -> ADE AB -> ABCDE AC -> ABCDE ABC -> ABCDE Thus, the keys for R are AB and AC. ABC is not a key because it properly contains a key. It is important to note that when considering the projected relation R, we do not "project out" FD's. All of them need be considered when reasoning about FD's in R. ```

 ```Problem #18 MVD's ```
 ```Correct: c) I and II only We know that A ->-> BC and D -> B. Claim: A ->-> BC and D -> B imply that A -> B. Proof: Consider two tuples that share the A attribute: (a, b1, c1, d1, e1) and (a, b2, c2, d2, e2). From A ->-> BC, we deduce that (a, b2, c2, d1, e1) must also belong to R. Now, consider the tuples (a, b1, c1, d1, e1) and (a, b2, c2, d1, e1) along with the FD D -> B. It follows that b1 = b2. This implies that A -> B holds. Claim: A ->-> BC and A -> B imply that A ->-> C Proof: Consider two tuples satisfying A -> B: (a, b, c1, d1, e1) and (a, b, c2, d2, e2) From A ->-> BC, we deduce that (a, b, c2, d1, e1) and (a, b, c1, d2, e2) must also belong to R. This implies that A ->-> C holds. We cannot assert that A ->-> D holds. In particular, a relation consisting of the two tuples (a, b, c, d1, e1) and (a, b, c, d2, e2) satisfies the given dependencies, but not A ->-> D. ```

 ```Problem #19 E/R Diagram and ODL ```
 ```Correct: b) class A {relationship B R inverse B::R} ; class B {relationship set R inverse A::R ; relationship C S inverse C::S} ; class C {relationship set S inverse B::S} ; This schema declares class A with A::R being a many-one relationship from A to B. Class B has a one-many relationship from B to A (a single value of B can be related to a set of values in A). The relationships B::S and C::S are defined analogously. ```

 ```Problem #20 /\, \/ and - with bag interpretations ```
 ```Correct: b) R \/ (S /\ T) = (R \/ S) /\ (R \/ T) The four options were: (a) R \/ (S - T) = (R \/ S) - T (b) R \/ (S /\ T) = (R \/ S) /\ (R \/ T) (c) R /\ (S \/ T) = (R /\ S) \/ (S /\ T) (d) None of the above Consider a tuple x that occurs r, s and t times in R, S and T respectively. The question is equivalent to identifying as to which of the following identities is really true: (a) r + max(s-t, 0) = max ((r + s) - t, 0) (b) r + min(s, t) = min (r+s, r+t) (c) min(r, s+t) = min(r, s) + min(r, t) (d) None of the above Note that terms like R \/ S, R /\ S and R - S have been replaced by r+s, min(r, s) and max(r-s, 0) respectively because they faithfully capture bag semantics. Only option (b) is correct for integers, in general. ```

 ```Problem #21: FD's and Lossless Joins ```
 ```Correct: b) AD, BDE, and ABC A join of two relations is lossless whenever the intersection of their attributes functionally determines one of the relations. This condition applies whenever we decompose in the BCNF-decomposition algorithm, incidentally. In decomposition (b), we can join AD with ABC first, since the intersection, A, functionally determines ABC. That gives us ABCD, which must be joined with BDE. Now, the intersection is BD, and a given FD tells us BD -> BDE, so this join too is lossless. To check that none of the other choices has a lossless join, we construct counterexamples. In each case, let's start with three hypothetical tuples that when projected and joined give us (a,b,c,d,e). For instance, in choice (a), these tuples would have the form (a,b,c1,d1,e1), (a2,b,c,d2,e2), and (a3,b,c3,d,e). Apply the given dependencies to equate certain symbols, if necessary. For answer (a), we would have to admit that c1 and c3 are both c, but then we have three tuples that satisfy the given FD's: (a,b,c,d1,e1), (a2,b,c,d2,e2), and (a3,b,c,d,e). Since none of these are (a,b,c,d,e), we know that the relation consisting of these three tuples, when projected and joined, yields a tuple not in the original relation; i.e., this decomposition is not lossless. Note: if you don't trust the use of abstract symbols, replace each of the symbols a, a2, etc., by a distinct integer. ```

 ```Problem #22: Oracle TYPEs, TABLEs and SELECT ```
 ```Correct: a) Only I I SELECT rr.d.a FROM R rr ; is okay as R has an alias "rr" whose members rr.d.a are SELECTed. II SELECT d.a FROM R ; will cause an error in Oracle because it expects an alias for R. III SELECT a FROM THE(SELECT d FROM R) ; will produce errors for two reasons: - the sub-query (SELECT d FROM R) requires that R be given some alias, e.g., SELECT rr.d FROM R rr. - the overall SELECT also requires that THE(...) be aliased, e.g., SELECT xx.a FROM THE(...) xx. Thus, only I produces no error in Oracle. ```

 ```Problem #23: E1, E2, ... E10 isa Hierarchy ```
 ```Correct: b) 10 Let us denote the entity sets by E(1), E(2), ... E(10). For 1 <= i <= 10, if e belongs to E(i), then e also belongs to each entity set in the set {E(1), E(2), ... E(i-1), E(i)}. Since i ranges from 1 thru 10, there can only be ten such sets. ```

 ```Problem #24: OQL query over Team and Player ```
 ```Correct: a) (b) is incorrect because p. playsFor is a team object, not a collection, and therefore cannot appear in the FROM clause. (c) is incorrect because t.players is a collection, and thus cannot be extended by a dot to t.players.playsFor. (d) is incorrect for a similar reason. ```

 ```Problem #25: E/R diagram with A(50), B(20), C(200) and R ```
 ```Correct: c) 1000 R has two functional dependancies: AB -> C and AC -> B. Consider tuples in R with the first attribute A = a. If (a, b, c) is a tuple, - there cannot be another tuple (a, b', c) with b <> b' - there cannot be anotehr tuple (a, b, c') with c <> c' Effectively, there is a "pairing" of b and c. Since B has 20 entities while C has 200, there can be at most 20 such pairs (minimum of 20 and 200 is 20). Thus, any particular value of A = a can be associated with only 20 pairs of the form (b, c). Since there are 50 distinct values of A, the total size of R can be at most 50 x 20 = 1000 tuples. ```

 ```Problem #26: Three manager-queries in Datalog ```
 ```Correct: d) I, II and III All three queries can be expressed in Datalog: I Highest (m, e) <-- Emps (e, s, m) AND MaxSalary (s, m) Max_Salary (s, m) <-- Emps (_, s, m) AND NOT Less_Than_Max_Salary (s, m) Less_Than_Max_Salary(s, m) <-- Emps (_, s, m) AND Emps(_, t, m) and s < t II Reports (m, e) <-- Emps (e, _, m) Reports (m, e) <-- Reports (m, f) AND Emps (e, _, f) III MgrOfMgrOfMgrOf (e, m) <-- Emps (e, _, x) AND Emps (x, _, y) AND Emps (y, _, m) ```

 ```Problem #27: Three manager-queries in SQL ```
 ```Correct: c) I and III only I Less_Than_Max_Salary (m, s) = PROJECT_(E1.salary, E1.manager) (SELECT_(E1.manager = E2.maanger AND E1.salary < E2.salary) (E1 x E2)) Max_Salary (s, m) = PROJECT_(Emps.salary, Emps.manager) (SELECT_(Emps.manager = Less_Than_Max_Salary.manager AND Emps.salary <> Less_Than_Max_Salary.salary) (Emps x Less_Than_Max_Salary)) Highest (m, e) = PROJECT_(Emps.manager, Emps.employee) (SELECT_(Emps.salary = Max_Salary.salary AND Emps.manager = Max_Salary.manager) (Emps x Max_Salary)) where E1 and E2 are the same Emps relation twice. II Cannot be expressed in core relational algebra. Intuitively, we need "recursion" to reach the (manager of)^* of an employee. III PROJECT_(E1.eName, E3.manager) (SELECT_((E1.manager = E2.eName) AND (E2.manager = E3.eName)) (E1 x E2 x E3)) where E1, E2 and E3 are the same Emps relation three times. ```

 ```Problem #28: Privileges for INSERT with REFERENCES ```
 ```Correct: b) REFERENCES on Manfs We need REFERENCES on Manfs (to be able to check the referential integrity constraint) and INSERT on Beers (to be able to insert a new tuple into Beers). Note that SELECT on Beers or Manfs is not necessary as we are not evaluating any (sub)-query that involves either table. UPDATE on Beers is also not needed as that privelege enables "update of existing tuples"; our need is "insertion of new tuples". ```

 ```Problem #29: GRANT/REVOKE privileges ```
 ```Correct: b) Following the convention in the DB textbook, the GRANT/REVOKE diagram at the end of the first four GRANT statements has four nodes: (a) A(INSERT on R, **) (b) B(INSERT ON R, *) (c) C(INSERT ON R, *) (d) D(INSERT ON R, *) There are four edges: (i) A(INSERT on R, **) ---> B(INSERT ON R *) (ii) B(INSERT on R, *) ---> C(INSERT ON R *) (iii) C(INSERT on R, *) ---> D(INSERT ON R *) (iv) D(INSERT on R, *) ---> B(INSERT ON R *) The double star denotes ownership of relation. Single stars denote "WITH GRANT OPTION". When B revokes privilege from C, edge (ii) gets deleted. This means node (c) and edge (iii) also get deleted. Since CASCADE option is specified, any other node not reachable from A (the owner of relation R, are deleted. This means that node (d) and edge (iv) also get deleted. This leaves us with two nodes A(INSERT on R, **) and B(INSERT on R, *) with an edge from A to B. ```

 ```Problem #30: JDBC statement ```
 ```Correct: a) myStat.setInt(1, 2) ; To replace tuple (1, 1) by (2, 1), we would like to execute the SQL statement "UPDATE R SET a = 2 WHERE b = 1". We already have PreparedStatement myState with one parameter (question mark placeholder) denoted by a "?". To be able to execute this statement, we first need to set the question mark placeholder by "myStat.setInt (1, 2) ;". After setting this value, we would write "myStat.executeUpdate() ;" to actually carry out the update. A tutorial covering JDBC PreparedStatement is available here). ```

 ```Problem #31: Bag vs set semantics ```
 ```Correct: c) R U S (a) T = R - S Irrespective of whether we employ bag or set semantics, (i) If a tuple occurs in both R and S, it is not part of T, (ii) if there is a tuple in R but not in S, exactly one copy is part of T (iii) if there is a tuple in S but not in R, it is not part of T. The resultant T is the same in both cases. (b) T = (R JOIN S) - (S JOIN R) T would be empty irrespective of whether we employ bag or set semantics. This is because (R JOIN S) and (S JOIN R) are identical sets/bags. (c) T = R U S If a tuple occurs in both R and S, then it occurs twice in T if we use bag semantics. Only one occurence occurs in T if we employ set semantics. (d) T = SELECT (a = 5) R Selection would produce the same set of tuples since R is a set. ```

 ```Problem #32 SQL Expression with 3-Valued Logic ```
 ```Correct: c) TRUE or UNKNOWN, but never FALSE Note that (X < Y) OR (X >= Y) is a tautology that evaluates to TRUE. However, SQL does not arrive at such an answer by algebraic manipulation. SQL would first evaluate each sub-expression individually and then evaluate the truth value of the entire expression. Thus, if X and Z are both UNKNOWN, the entire expression evaluates to UNKNOWN. Note that there is no way to make the entire expression FALSE. Thus, the expression can evaluate to either TRUE or UNKNOWN, but never FALSE. ```

 ```Problem #33: Answer with Red and Blue ```
 ```Correct: c) (1, 3) This is a problem that several people got wrong. Let's evaluate the two relations first. It is easy to see that P(x, y) = {(1, 2), (2, 3), (3, 4), (1, 3), (2, 4), (1, 4)} Q(x, y) = {(2, 4), (1, 3)} We are left with the evaluation of Answer(x, y) <- P(x, y) AND NOT Q(y, x) Now comes the tricky part: Notice the order of x and y in the rule "Answer(x,y) <- P(x,y) AND NOT Q(y,x)". It turns out that (1, 3) is not in Answer(x,y). ```

 ```Problem #34: Which rule is safe? ```
 ```Correct: d) P(x, y) <- Q(x) AND R(y) AND x < y AND NOT S(y) A Datalog rule is safe if every variable in a rule appears as a parameter to to some non-negated relational subgoal. In (a), y does not appear in any relation. In (b), x does not appear in any relation. In (c), z does not appear in any non-negated relation. In (d), all variables: x and y, appear in non-negated relations: Q(x) and R(y). ```

 ```Problem #35: Meaning of || ```
 ```Correct: a) Concatenation In SQL, || denotes the string concatenation operator. In some languages for expressing parallel programs, || denotes parallel execution. Example: Occam? In programming languages like C, || denotes lazy OR. In some older programming languages, || denoted eager OR. Example: Pascal? ```