Questions 25-26 are graded by Erik, Questions 27-29 by Mark, Question 30-31 by Mike, and Question 32 by Jun. If you want to discuss the grading of these questions, you should give your exam to the relevant person, together with a note explaining why you think the grading was questionable.
|
|
|
|
|
A | B | C | D | |
t1 | 1 | 1 | 1 | 1 |
t2 | 1 | 2 | 2 | 2 |
t3 | 1 | 2 | 1 | 1 |
t4 | 1 | 1 | 2 | 2 |
So, the spade is either C or D. (It could include an extra A but that wouldn't change the meaning of the FD.) Without loss of generality, let's suppose the spade is C (C and D are symmetric). The keys of R are {A, B, D} and {B, C, D}. Therefore, R is not in BCNF (both A->C and CD->A are BCNF violations). On the other hand, R is in 3NF (A->C doesn't violate 3NF because C is in the key {B, C, D}, and CD->A doesn't violate 3NF because A is in the key {A, B, D}).
Also, II won't violate any constraint; it's even more conservative than I.
I avoids this problem by processing one S tuple at a time; but II cannot avoid this problem.
A | B | C | D |
1 | 1 | 1 | 1 |
1 | 2 | 2 | 2 |
1 | 1 | 1 | 2 |
1 | 2 | 2 | 1 |
2200 is the result of the execution sequence T1.step1, T2, T1.step2.
2300 is the result of the execution sequence T2, T1.step1, T1.step2.
-- T3: INSERT INTO Enroll VALUES(123, 'CS145', 'Spring 1999', 4.0); |
|
-- T2: SELECT MIN(grade) FROM Enroll WHERE SID = 123; -- QMIN -- returns 4.0 |
|
ROLLBACK; -- T4: INSERT INTO Enroll VALUES(123, 'CS145', 'Spring 1999', 3.0); COMMIT; |
|
SELECT MAX(grade) FROM Enroll WHERE SID = 123; -- QMAX -- returns 3.0 |
Although II uses the difference operator, it is still monotone because II is actually equivalent to I!
III is not monotone. For example, suppose that Enroll contains a student 007 who has not taken CS145. The current result of III should contain 007. Now, suppose we insert into Enroll the fact that 007 has just taken CS145. Then, 007 would disappear from the result.
SuperStudent ---+ | A | - V | | SuperStat -----> SuperCourse | | | | | - V | +-------------> Enroll <-----+SuperStat is not monotone w.r.t. SuperCourse because adding a super course may cause the COUNT(*) field of certain SuperStat tuples to be incremented (updated). SuperStat is not monotone w.r.t. Enroll because adding an enrollment record may also cause the COUNT(*) field of a certain SuperStat tuple to be updated.
SuperStudent is monotone w.r.t. SuperCourse because adding a super course can only make more students become super; students who are already super will remain super. For the same reason, SuperStudent is monotone w.r.t. Enroll.
Similarly, SuperCourse is monotone w.r.t. SuperStudent and Enroll.
PROJ_{SID} (SELECT_{grade>4.0} (Enroll))For each student whose max grade is above 4.0, we could also say that they have at least one grade above 4.0. This query just selects all the tuples with grades above 4.0, and then projects out the SID's associated with those tuples.
Grading by Erik:
An alternate solution that received full credit (although it's not as elegant) was a brute force method that found all maximum scores and then selected the ones above 4.0: PROJ_{SID} (SELECT_{grade>4.0} ( PROJ_{SID,grade} (Enroll) - PROJ_{e1.SID,e1.grade} (RENAME_{e1}(Enroll) JOIN_{e1.SID=e2.SID AND e1.grade<e2.grade} RENAME_{e2}(Enroll)) ))-1 if the solution was way more complicated than need be --- for example, lots of extra stuff to get rid of "duplicates" (relational algebra uses set semantics, so we don't need to worry about duplicates). -2 if the query wouldn't work, and more it had bigger problems. 0 point was generally reserved for solutions that were missing substantial functionality (for example, if no attempt was made to represent significant parts of the query).
PROJ_{SID} (Enroll) - PROJ_{SID} (SELECT_{grade<=3.0} (Enroll))Grading by Erik:
Again, full credit was given to the less elegant, brute force submission, which was almost identical to the alternate solution for Question 25. -1 if the query contained lots of extra stuff that didn't do anything. -2 or more if the query was non-functional. In general, grading was harsher if the less elegant solution was attempted. For example, solutions that tried to incorrectly subtract the result of a natural join of two Enroll tables from one Enroll table received 2 points.
Grading by Mark:
2 points were given for the answer of "No" and basically for any reason that mentioned deletions. no points were given for the answer of "Yes". 1 point was given the answer of "No" but for a wrong reason; e.g.:
- a reason like only one tuple could be seen (but we could use subquery)
- some rational about why subqueries could cause problems (they don't, because the only table it mentioned is its own table)
CREATE ASSERTION BeatCal CHECK (EXISTS (SELECT * FROM Rating r1, Rating r2 WHERE r1.univ = 'Stanford' AND r2.univ = 'Berkeley' AND r1.prog = r2.prog AND r1.score > r2.score));Grading by Mark:
5 points for a perfect solution. Around 4 points for forgetting Berkeley, forgetting to join on program, or having a minor error syntactic or semantic error, but showed understanding of how it worked. Somewhere in the vicinity of 1.5 or 2 points for a solution that did something to the effect that all Stanford programs are better than corresponding Berkeley ones, or something like there isn't a Berkeley program with a higher rating than all of Stanford's programs.
SELECT COUNT(*) + 1 FROM Rating WHERE prog = 'Math' AND score > (SELECT score FROM Rating WHERE prog = 'Math' AND univ = 'Stanford');Or,
SELECT COUNT(*) + 1 FROM Rating r1, Rating r2 WHERE r1.prog = 'Math' AND r2.prog = 'Math' AND r2.univ = 'Stanford' AND r1.score > r2.score;Grading by Mark:
6 points for a perfect solution. 5.5 for forgetting either the +1, mixing up the > for >=, or having an unnecessary subquery SELECT ... FROM (SELECT * FROM ...). 5 for combining these errors, forgetting to check for Stanford, or forgetting to do a prog = 'Math' somewhere. Somewhere in the vicinity of 4 points for getting the right idea, but being confused about GROUP BY and having to get a solution that does not work. Around 1.5 points for trying to use the Oracle special feature of ROWNUM. Most of these solution did not always work, anyway. Recursive solutions also received nearly this amount of points (recursion is in SQL3, not SQL2). Around 0.5 or 1 point for something in SQL that showed you were trying to think about the problem.
WITH RECURSIVE Closure(attr) AS (SELECT attr FROM AttrSet) UNION (SELECT rhs FROM Closure, FD WHERE attr = lhs) SELECT * FROM Closure;Grading by Mike:
-1 no "query" part in the WITH statement. -1.5 adding a "NOT IN Closure" clause condition (not monotonic). -.5 large amounts of query redundancy. -.5 some larger syntax errors. -1.5 computing the FD closure and incorrectly querying it. -1 not using AttrSet for the base case (e.g. using lhs of FD). -1.5 (or more) writing a non-recursive query (e.g. joining FD with AttrSet only, instead of FD and Closure).
(SELECT k1.person1, k2.person2 FROM Knows k1, Knows k2 WHERE k1.person2 = k2.person1 AND k1.person1 <> k2.person2) EXCEPT (SELECT * FROM Knows);Grading by Mike:
Almost all of the scores fell into these categories:
5 perfect / almost perfect. 4.5 forgetting k1.p1 <> k2.p2. 4 returning distance equal to 3. 4 unsuccessful attempt to check the short circuit case (distance equal to 1). 3 no check for short circuit. 2.5 returning distance less than or equal to 3, without checking for short circuit.
WITH RECURSIVE GatesDistance(person, distance) AS (SELECT person2, 1 FROM Knows WHERE person1 = 'Mr. Gates') UNION (SELECT * FROM GatesDistance) UNION (SELECT k.person2, gd.distance + 1 FROM GatesDistance gd, Knows k WHERE gd.person = k.person1 AND k.person2 NOT IN (SELECT person FROM GatesDistance)) SELECT 'Mr. Gates', person, distance FROM Gates WHERE person <> 'Mr. Gates';The solution above is guaranteed to converge to a fixed point for any finite instance of Knows. The n-th iteration find all persons within distance n from Mr. Gates; the next iteration will add persons at a distance of n+1 from Mr. Gates. The condition k.person2 NOT IN (SELECT person FROM GatesDistance) guarantees the convergence of the fixed-point iteration when Knows contains cycles. This condition also ensures that the output distances are indeed minimum.
Strictly speaking, however, this recursion is not linear. What's worse, it's not even stratified since GatesDistance is not monotone w.r.t. itself, according to our definition of monotonicity. For example, suppose that Knows currently contains ('Mr. Gates', 'X'), ('Mr. Gates', 'Y'), and ('Y', 'Z'); GatesDistance currently contains ('X', 1) and ('Y', 1); at this point, the result of the query contains ('X', 1), ('Y', 1), and ('Z', 2). Now, suppose we add ('Z', 1) to GatesDistance. Then, the result of the query would contain ('X', 1), ('Y', 1), and ('Z', 1). The old tuple ('Z', 2) has been invalidated. On the other hand, this nonmonotonicity problem will never come up in the fixed-point iteration, because new tuples are added to GatesDistance in the breadth-first order.
Actually there is a linear and stratified solution, although it is less efficient than the solution above:
WITH RECURSIVE GatesDistance(person, distance) AS (SELECT person2, 1 FROM Knows WHERE person1 = 'Mr. Gates') UNION (SELECT k.person2, gd.distance + 1 FROM GatesDistance gd, Knows k WHERE gd.person = k.person1 AND gd.distance < (SELECT COUNT(*) - 1 FROM ((SELECT person1 FROM Knows) UNION (SELECT person2 FROM Knows)))) SELECT person, MIN(distance) FROM GatesDistance GROUP BY person HAVING person <> 'Mr. Gates';Here, GatesDistance is monotone w.r.t. both itself and Knows. The condition gd.distance < (SELECT COUNT(*) ...) guarantees that the fixed-point iteration will terminate, after it has explored all paths of length m-1 from Mr. Gates, where m is the total number of persons in the database (any path of length m or longer will definitely contain a cycle).
Grading by Jun:
-1 if your solution is equivalent to the first (imperfect) solution. That is, the recursion is not stratified; however, in practice, every GatesDistance tuple computed by the current step of the fixed-point iteration will be returned again in the next step. -2.5 if your solution is not stratified; moreover, if you really carry out the fixed-point iteration, you will find some tuples missing in the final result. This is a rather common mistake. Many of you included the condition k.person2 NOT IN ..., but didn't include UNION (SELECT * FROM GatesDistance). Consequently, the n-th step only returns persons at distance 1 or n from Mr. Gates; it won't return those at distances 2 to n-1! -2.5 if the fixed-point iteration won't converge when Knows contains cycles, say, ('Mr. Gates', 'X'), ('X', 'Y'), and ('Y', 'X'). In order to guarantee convergence, you need a condition such as k.person2 NOT IN ... or gd.distance < ..., discussed above. -6 if your query correctly computes the set of persons Mr. Gates knows (directly or indirectly), but it doesn't compute their distances from Mr. Gates at all.
Jun Yang | CS145 Spring 1999 |