CS145 Midterm Sample Solution

Question 17 is graded by Charity, Question 18 by Mike, Question 19 by Mark, and Question 20 by Jun. If you want to discuss the grading of these questions, you should give your exam to the relevant person, together with a note explaining why you think the grading was questionable.

Question 1: (B)

If one X is related to a set of Y's and vice versa, then the relationship between X and Y is many-many. If one X is related to one Y's but one Y is related a set of X's, then the relationship between X and Y and many-one. If one X is related to one Y and vice versa, then the relationship between X and Y is one-one.

Question 2: (D)

In a multiway relationship set, an arrow pointing into entity set X means that if you pick one entity from each other entity set, together they determine one entity in X. So, for this question, the FD's in R(A, B, C, D) are ABC->D, ABD->C, and ACD->B. In other words, the keys are {A, B, C}, {A, B, D}, and {A, C, D}.

Question 3: (C)

When we translate a subclass in E/R to a relation, we include all attributes attached to the subclass and its inherited key. In the case of ConferenceRooms, we need to include capacity and the key of Rooms. Rooms is a weak entity set, so its key is {name, number}, where name comes from Buildings via the In relationship set.

Question 4: (C)

In the ODL-style translation of subclasses, each object is represented in only one relation and all information on the object goes into that relation. Therefore, ConferenceRoom_ODL contains the n conference rooms, while Room_ODL contains the rooms that are not conference rooms (m-n of them).

Question 5: (A)

I is true because every ConferenceRooms entity is also a Rooms entity. II is false because there might be Buildings entities that are not related to any Rooms entity (i.e., there might be buildings with no rooms).

Question 6: (D)

AB->C does not hold because of the first and the third rows.

Question 7: (C)

Since B and E do not appear on the right side of any FD, they must be a part of every key. Let's compute the closure of {B, E}. First, apply B->D to get D into the closure. Then, DE->A adds A into the closure. Finally, AB->C adds C into the closure. The closure of {B, E} covers all attributes of S, so it's a superkey of S. But we also know that B and E must be a part of every key, so {B, E} is the only key.

Question 8: (B)

I is incorrect because BE->AC also holds in S2 (which should be rather obvious after Question 7 because {B, E} is a key). II is true because AB does not contain {B, E}, the only key of S2. III is incorrect because S4 should be (A, B, E) instead of (C, E).

Question 9: (A)

I is true because DE->A is also a BCNF violation for S. II is false: if we decompose using DE->A we will get a relation with attributes (A, D, E), which we will not get by decomposing with B->D first.

Question 10: (A)

Since A does not appear on the right side of any FD, it must be in every key.

Question 11: (B)

Since every key contains A, it cannot contain C. If it also contains C, then it would not be minimal: we can safely take C out and still have a key because C is determined by A (according to A->BC).

Question 12: (A)

This one requires more than average thought. If some key contains E, then the third FD must be nontrivial and its left side must contain E. In this case, {A, E} is a key, and it easy to see that {A, D} is a key too. Thus, I is true. II is false because the third FD could be trivial. In this case, the only key is {A, D}.

Question 13: (C)

Yes, III is correct.

Question 14: (D)

{A} may no longer be a key. Consider the following example:

R:
A	B
apple	sweet
lemon	sour

S:
A	B
apple	sour

T:
A	B
apple	sweet
lemon	sour
apple	sour

Question 15: (D)

I is incorrect: if somebody works for Microsoft and owns both Microsoft and Yahoo! stocks, he will be returned by the query. II and III are correct.

Question 16: (A)

I is obviously correct. However, II is not, because the join could duplicate salaries and result in an incorrect average. Suppose that X's salary is $30000 and Y's salary is $60000. X owns 200 shares of Microsoft. Y owns 200 shares of Microsoft and 200 shares of Yahoo!. When FROM and WHERE clauses are done, we end up with one tuple about X but two tuples about Y. Then, when we take the average we get ($30000+$60000+$60000)/3 = $50000. But the correct answer should be ($30000+$60000)/2 = $45000.

Question 17:

PROJ_{SSN} ( SEL_{employerSymbol='IFMX'} (Person)
             NATURAL-JOIN
             SEL_{symbol='ORCL' AND numShares>50} (Holding) )

Grading by Charity:

-.5 for >= instead of >

-.5 for selecting SSN instead of Person.SSN or Holding.SSN when the two relation are crossed or theta-joined

-1 for writing SQL queries --- if the query works

-2 for neglecting a condition (for example, not selecting for ORCL stocks)

Question 18:

SELECT SUM(numShares)
FROM   Person, Holding
WHERE  employerSymbol = 'IFMX'
AND    Person.SSN = Holding.SSN
AND    symbol = 'ORCL';

Alternate solution:

SELECT SUM(numShares)
FROM   Holding
WHERE  symbol = 'ORCL' AND
       SSN IN (SELECT SSN
               FROM   Person
               WHERE  employerSymbol = 'IFMX');

Grading by Mike:

-2: Forgetting to join on SSN (Person.SSN = Holding.SSN)

-1: Forgetting a clause in the WHERE (e.g. IFMX or ORCL)

-1: Wrong aggregate (e.g. count)

0: Small / Inconsequential Syntax Error (e.g. quotes)

-.5: Syntax Error (e.g. "=" instead of "IN")

-.5: Overly complicated query (e.g. "GROUP BY" or multiple sub-selects)

Some of the submissions had more substantial logic or syntax errors and the grading scheme above just didn't apply. For those people, I assigned a grade, somewhat arbitrarily (probably between 0 and 3.5) based on how much of the "idea" of the question that they captured.

Question 19:

PROJ_{symbol} (StockPrice)
-
PROJ_{s1.symbol} (
    RENAME_{s1}(StockPrice)
    THETA-JOIN_{s1.symbol=s2.symbol AND
                s1.date<s2.date AND s1.price>=s2.price}
    RENAME_{s2}(StockPrice)
)

Grading by Mark: Generally, a common mistake people made was to find all stocks who price went up at some point:

    /* WRONG! */
    PROJ_{s1.symbol} (
        RENAME_{s1}(StockPrice)
        THETA-JOIN_{s1.symbol=s2.symbol AND
                    s1.date>s2.date AND s1.price>s2.price}
        RENAME_{s2}(StockPrice)
    )

A correctly written query doing this received 2 points. 1.5 points was given for this without the s1.symbol=s2.symbol condition. A 1 point answer was almost any relational algebra query. You got 2 points if you wrote a SQL query that is right, but is not easy to translate to relational algebra (e.g., use ANY, EXISTS, etc.). 1 point for a SQL query making the same mistake as above.

If you got near the correct answer (did not make the mistake above), you got 4.5 points if you reported all non-falling stocks rather than all rising stocks (e.g. < instead of <=). You got 4 points if you forgot the condition needed to join on the symbols. (Remember, this is necessary for theta-join; it is only unnecessary if you used a natural join and made sure the column names for symbol were the same.) A 3 or 3.5 point answer had the right idea, but made more major mistakes.

Question 20:

SELECT symbol, price
FROM   StockPrice s
WHERE  symbol IN
         (SELECT   symbol
          FROM     Holding
          GROUP BY symbol
          HAVING   COUNT(*) >
            (SELECT 0.4*COUNT(DISTINCT SSN) FROM Holding))
AND    date >= ALL
         (SELECT date
          FROM   StockPrice
          WHERE  symbol = s.symbol);

Grading by Jun:

+2 for making a commendable attempt at the problem and correctly showing the overall idea and structure of the query

+2 for correctly identifying all "widely-held" stocks

+2 for correctly identifying the latest price for each stock

Common mistakes include:

Applying an aggregate function to a subquery (e.g., COUNT(SELECT ...))
Aggregate functions in SQL should be applied to attributes!

Applying quantifiers to attributes (e.g., >= ANY date)
Quantifiers should be applied to subqueries!

HAVING clause does not have a condition (e.g., HAVING MAX(DATE))

Jun Yang

CS145 Spring 1999