CS145 - Introduction to Databases
Spring 2001, Prof. Widom

Assignment #4
Due Wednesday May 9

LOGISTICS

THE ASSIGNMENT

Problem 1

Review Section 5.9 in the textbook. In this problem you are to use SQL to implement two of the natural outerjoin operators, without using SQL's built-in outerjoin. Specifically, consider two relations R(A,B) and S(B,C). Write:

(a) A SQL query that computes the natural full outerjoin of R and S, without using outerjoin

(b) A SQL query that computes the natural left outerjoin of R and S, without using outerjoin

In your queries, you may select the constant value NULL if you find it useful. For example, the following query returns all tuples in T(A,B,C) with all B values replaced by NULL:

   select A,NULL,C from T

Problem 2

Consider a relational schema Employee(ID,salary), where ID is a key. Do you think the following two queries are always equivalent, i.e., do they yield the same result on all possible databases?
  SELECT AVG(salary) FROM Employee

  SELECT SUM(salary)/COUNT(*) FROM Employee
If you think they are always equivalent, justify your answer. If you think they are not always equivalent, give the simplest example data for Employee you can think of for which you expect the two queries to yield different results.

Problem 3

Recall from Problem 3 in Assignment #3 the relation Flight(city1,city2,cost,airline), containing nonstop flights from one city to another. We were interested in part (c) of that problem in writing a single SQL query that would return the cheapest cost of flying between each pair of cities, regardless of the number of stops. While you may have had difficulty writing such a query in SQL2, it is possible to compute the answer using SQL in conjunction with a programming language, and that's your task in this problem. You may use SQL embedded within a C-like language as presented in Section 7.1 of the textbook, you may use a PL/SQL-like syntax, or you may use CLI/JDBC-style dynamic SQL. Regardless, we are not particularly concerned with correct syntax, and you can certainly "wing it" on syntax for things like checking for end conditions. We're interested primarily in your overall algorithm, and the interaction between programming constructs and SQL.

Problem 4 (Personal Database Application, Part 4)

In this part of the project, you will issue SQL queries and updates against your PDA database, and you will experiment with the use of indexes. Since you will be modifying your data as part of this assignment, we strongly suggest that you adopt the routine for getting repeated "fresh" starts with Oracle described in the Maintaining your databases section of Assignment #3, if you haven't already.

(a) SQL

Develop and test:

Please note:

(b) Indexes

In part (a) you may have discovered that some queries run very slowly over your large database. As discussed in class, an important technique for improving the performance of queries is to create indexes. An index on an attribute A of relation R allows the database to quickly find all tuples in R with a given value for attribute A (which is useful when evaluating selection or join conditions involving attribute A). An index can be created on any attribute of any relation, or on several attributes combined. The syntax for creating indexes in Oracle is given in the document Oracle 8i SQL. Please pay careful attention to the information about Stanford/Oracle-specific syntax.

Create at least three useful indexes for your PDA. Run your queries from part (a) on your large database with the indexes and without the indexes. Turn in a script showing your commands to create indexes, and showing the relative times of query execution with and without indexes. Here too, please truncate any large query results.

Please note: