CS145 - Spring 2004
Introduction to Databases

Challenge Problems #2 Due Tuesday April 20

Preparation and Submission: All challenge problems must be submitted electronically. Please refer to Challenge Problems #1 for accepted formats and guidelines on typing solutions. Submit your solutions using the script "/afs/ir/class/cs145/bin/submit-challenge", as you did for Challenge Problems #1, except this time call your file challenge2.<ext>. (Details of electronic submission can be reviewed in Challenge Problems #1.) The on-time deadline is 11:59 PM on the due date. See the Assigned Work page for the policy on late submissions. Please remember that challenge problems must be submitted electronically, and plan accordingly.
Honor Code reminder: For more detailed discussion of the Stanford Honor Code as it pertains to CS145, please see the Assigned Work page under Honor Code. In summary: You must indicate on your written and programming assignments any assistance (human or otherwise) that you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code. In any event, you are responsible for understanding and being able to explain on your own all material that you submit.

The Problems

Consider a SQL table T(K,V) where K is a key and NULL values are not permitted in either column. Consider the following three queries:
```
  Q1: select V from T
      where V >= all (select V from T)

  Q2: select V from T as T1
      where V > all (select V from T as T2 where T2.K <> T1.K)

  Q3: select max(V) from T
```
(a) Are Q1 and Q2 equivalent? That is, are they guaranteed to produce the same result on every possible instance of table T? If not, show the smallest instance of T you can find for which Q1 and Q2 produce different results. (Note: All three queries produce an empty result on an empty table.)
(b) Same as part (a), except consider equivalence of queries Q2 and Q3.
(c) Same as part (a), except consider equivalence of queries Q1 and Q3.
Provide a general algorithm for eliminating HAVING clauses in SQL queries. That is, provide an algorithm that takes as input a SQL query Q with a HAVING clause. The algorithm should produce as output a SQL query Q' without a HAVING clause such that Q and Q' are equivalent, i.e., Q and Q' produce exactly the same answer on all databases. For simplicity you may assume:
- Q is a single SELECT-FROM-WHERE-GROUPBY-HAVING statement, i.e., it is not composed of the UNION of multiple SELECT statements for example.
- Q has only one HAVING clause, at the outermost level.
Do not make any other assumptions about the input query Q. Your algorithm should be based only on the query itself, not on the schema, the state of the database, or any other external information. (Hint: The solution to this problem is not all that complex. If you are developing a very complicated algorithm then you're probably headed in the wrong direction.)
Make sure you've read about outerjoins in the textbook (pages 228-230 and 272-274).
Most join operators are associative, meaning that:
```
  (R1 JOIN R2) JOIN R3
  and
  R1 JOIN (R2 JOIN R3)
```
always produces the same final result.
(a) Is the natural-left-outerjoin operator associative? If so, briefly argue why. If not, show the simplest example you can find where:
```
  (R1 NATURAL-LEFT-OUTERJOIN R2) NATURAL-LEFT-OUTERJOIN R3
  and
  R1 NATURAL-LEFT-OUTERJOIN (R2 NATURAL-LEFT-OUTERJOIN R3)
```
produce a different final result.
(b) Is the natural-full-outerjoin operator associative? If so, briefly argue why. If not, show the simplest example you can find where:
```
  (R1 NATURAL-FULL-OUTERJOIN R2) NATURAL-FULL-OUTERJOIN R3
  and
  R1 NATURAL-FULL-OUTERJOIN (R2 NATURAL-FULL-OUTERJOIN R3)
```
produce a different final result.