CS145 - Introduction to Databases
Spring 2001, Prof. Widom

Assignment #3
Due Wednesday May 2

LOGISTICS

THE ASSIGNMENT

Problem 1

Consider the following relational database schema:
  Student(ID, name, dept, status)  // status = "grad" or "undergrad"
                                   // ID is a key
  RA(ID, advisor, dept)            // (ID,advisor) together are a key
  TA(ID, course, dept)             // (ID,course) together are a key
and the following query: Find the names of all graduate students who are neither an RA nor a TA.

(a) Write this query in relational algebra.

(b) Write this query in SQL.

Now consider the following query: Find the names of all graduate students who are an RA or a TA in a department other than their own.

(c) Write this query in relational algebra.

(d) Write this query in SQL.

Problem 2

Consider the following relation:
  Emp(ID, salary)  // ID is a key, salary is a key
and the following two queries:
  1. Find the highest employee salary.
  2. Find the top third of salaries. You may assume that the number of employees (and therefore salaries) is divisible by three.

(a) Express Query 1 in relational algebra.
Hint: Join Emp with itself to find salaries that aren't the highest, then use the difference operator.

(b) Express Query 1 in SQL using the max aggregation operator.

(c) Express Query 1 in SQL without using the max aggregation operator.

(d) Express Query 2 in relational algebra. If needed, you may use the following programming constructs in addition to relational algebra:

(e) (optional) Express Query 2 in SQL. Do not use the programming constructs from part (d).

Problem 3

Consider a relation Flight(city1,city2,cost,airline) containing nonstop flights from one city to another. Note that the flights from city1 to city2 are independent of the flights from city2 to city1. For example:

city1city2costairline
SFDenver300Frontier
SFDenver350United
DenverSF250United
DenverSF250Frontier
DenverChicago250American
ChicagoNY250Delta
DenverNY500American
DenverNY400TWA
SFNY750United

(a) Give a single SQL query that returns the cost of the cheapest nonstop flight between each pair of cities. For example, the result over the above relation instance should be:

city1city2cost
SFDenver300
DenverSF250
DenverChicago250
ChicagoNY250
DenverNY400
SFNY750

(b) Give a single SQL query that returns the cheapest cost of flying between each pair of cities assuming we are willing to stop up to two times en-route. For example, by stopping once (in Denver), we can get from SF to NY for $700 instead of $750. With this example data, we could stop twice (in Denver and Chicago), but that would be more expensive ($300+$250+$250 = $800).

(c) Is it possible to write a single SQL query that returns the cheapest cost of flying between each pair of cities regardless of the number of stops? If so, give the query. If not, briefly explain why.

Problem 4

Consider the following relation:
  Advised(Advisor, Student, Year)
A tuple (A,S,Y) in Advised specifies that advisor A advised student S who graduated in year Y. Assume that Student is a key for this relation.

(a) Consider the following SQL query, which finds all advisors who advised a student who graduated in the same year that Hector Garcia-Molina (HGM) or Jennifer Widom (JW) graduated.

  SELECT Advisor
  FROM Advised
  WHERE Year IN
    (SELECT Year FROM Advised
     WHERE Student = "HGM" OR Student = "JW")
Try to write an equivalent SQL query that does not use any subqueries. If there are any circumstances in which your "equivalent" query can produce a different answer from the query above, please explain.

(b) (optional) Write a SQL query that finds the advisor(s) with the longest advising span, i.e., with the longest period from their earliest advisee to their latest advisee.

Problem 5

Consider a relation R(A,B) that may contain duplicate tuples, for example:

AB
12
12
34
56
56
56

(a) Suppose we are interested in selecting all but one copy of each tuple appearing in R. For example, on the above data the query result would be:

AB
12
56
56

Can you write this query in SQL? If so, give the query. If not, briefly explain why.

(b) Suppose instead we are interested in actually deleting one copy of each tuple in R. That is, we want to write a SQL DELETE statement such that after the statement executes, table R has the contents:

AB
12
56
56

Can you write such a DELETE statement in SQL? If so, give the statement. If not, briefly explain why.

Problem 6

(optional) Consider a relation R(A,B,C). Write a relational algebra expression that always returns empty if and only if the multivalued dependency A->>B holds on R.

Problem 7 (Personal Database Application, Part 3)

In this part of the project, you will create a relational schema for your PDA in the Oracle database system, and you will populate the tables in your database with initial data sets.

(a) Familiarize yourself with the Oracle relational DBMS by reading the document Getting Started With Oracle (make sure to read the most recent version), logging into Oracle, changing your password, trying some of the examples in the document, and experimenting with the help command. You don't need to turn anything in for this part.

(b) Create relations for your PDA based on your final relational schema from PDA Part 2. In addition to creating the appropriate attributes and types, please declare keys for your relations; see Creating a Table With a Primary Key from Getting Started With Oracle. Many of the attribute types supported by Oracle are listed on page 286 of the textbook. If you have an attribute that represents a date and/or time, you may want to look at our help page on Oracle Dates and Times.

Turn in a script showing an Oracle session in which your relations are created successfully. Also show, for each relation, the result of the sqlplus DESCRIBE command once the relation has been created: for a relation T, type "DESCRIBE T;". Please see Creating scripts and what to turn in below for details.

(c) For each relation in your PDA, create a file containing a few (approximately 5-10) records of "realistic" data. Then use the Oracle bulk-loading facilities to insert those records as tuples into your relations. Refer to the document Using the Oracle Bulk Loader for file format and how to load records into Oracle. (Here too make sure to look at the most recent version.)

Turn in a listing showing the contents of the files you created, the successful loading of the data into Oracle, and the execution of "SELECT *" commands to show the contents of each relation. (Again, information is available in Getting Started With Oracle.)

(d) Write a program in any programming language you like that creates large files of records for each of your PDA relations. If you have available real data for your PDA, then your program will need to transform the data into files of records conforming to your PDA schema and to Oracle's load format. The rest of you will need to write a program to fabricate data: your program will generate either random or nonrandom (e.g., sequential) records conforming to your schema. Note that it is both fine and expected for your data values - strings especially - to be meaningless gibberish. The point of generating large amounts of data is so that you can experiment with a database of realistic size, rather than the small "toy" databases often used in classes. The data you generate and load should be on the order of:

If your application naturally includes relations that are expected to be relatively small (e.g., schools within a university), then it is fine to use some small relations, but please ensure that you have relations of the sizes prescribed above as well. When writing a program to fabricate data, there are two important points to keep in mind:

  1. Make sure not to generate duplicate values for key attributes.

  2. Your PDA almost certainly includes relations that are expected to join with each other. For example, you may have a Student relation with attribute courseNum that's expected to join with attribute number in relation Course. When generating data, be sure to generate values that actually do join - otherwise all of your interesting queries will have empty results! There are a couple of ways to properly generate joining values. One way is to generate records for multiple relations (e.g., Course and Student) at the same time. Another way is to generate the records for one relation first, and then use the joining values for the other relation. For example, you could generate records for relation Course first, then use the Course.number values when creating values for Student.courseNum.

Turn in your program code for generating or transforming data, a small sample of the records generated for each relation (5 or so records per relation), and a script showing the successful loading of your data into Oracle.

Creating scripts and what to turn in

Please see Recording Your Session from the Getting Started With Oracle document for a guide to preparing output to be submitted for this and subsequent project parts.

Components (b), (c), and (d) of this project part each tell you what should be recorded in the script that you turn in. In this and all subsequent project parts, the material you turn in should be clearly formatted and delineated, and should include comments for any aspects that are not crystal clear. Poorly assembled or documented material will not receive full credit, even if it is correct. You also will not receive full credit if you turn in your entire large data files (or large query results in later assignments) when we ask for small samples. Other than comments, truncation, and simple formatting, it is an Honor Code violation to edit scripts before turning them in. Please see http://www.stanford.edu/class/cs145/submit-README.txt for details on electronic submission.

Maintaining your databases

You will be using both your small (part c) and large (part d) databases for the rest of the course. The idea is to use the small database to experiment on meaningful-looking data, and the large one to experiment on data of more realistic size. We suggest that you keep the two databases going in parallel by creating two sets of files and two sets of relations with slightly different names.

For the duration of the project, we suggest that you establish some kind of routine that includes reloading your database from the files created in this project part each time you want to get a "fresh" start with Oracle. Remember to delete the contents of each relation (or destroy and recreate the relations) before reloading. Otherwise, unless there is a declared key (or you take APPEND out of your control file), Oracle will happily append the new data to your old relation, causing your relation size to double, triple, quadruple, etc. To get rid of a table called T, issue the command:

  drop table T;
If you want to get rid of all tuples in T without deleting the table itself, issue the command:
  delete from T;