CS145 - Introduction to Databases
Spring 2001, Prof. Widom
Assignment #4
Due Wednesday May 9
![](http://www-cs.stanford.edu/gifs/line.red2.gif)
- Please read the document Oracle 8i
SQL (make sure to read the most recent version) for all kinds of
useful information related to the PDA part of this assignment.
- Turning in written exercises: The procedure for turning
in the written exercises on this assignment is the same as for the
previous assignments, with the same deadline and late policy.
- Turning in PDA: Your PDA work will again be submitted
electronically. The procedure is the same as for the previous
assignment (see http://www.stanford.edu/class/cs145/submit-README.txt),
and the same deadline and late policy applies.
![](http://www-cs.stanford.edu/gifs/line.red2.gif)
THE ASSIGNMENT
Problem 1
Review Section 5.9 in the textbook. In this problem you are to use
SQL to implement two of the natural outerjoin operators, without using
SQL's built-in outerjoin. Specifically, consider two
relations R(A,B) and S(B,C). Write:
(a) A SQL query that computes the natural full outerjoin of
R and S, without using outerjoin
(b) A SQL query that computes the natural left outerjoin of
R and S, without using outerjoin
In your queries, you may select the constant value NULL if
you find it useful. For example, the following query returns all
tuples in T(A,B,C) with all B values replaced by
NULL:
select A,NULL,C from T
Problem 2
Consider a relational schema Employee(ID,salary), where
ID is a key. Do you think the following two queries are
always equivalent, i.e., do they yield the same result on all possible
databases?
SELECT AVG(salary) FROM Employee
SELECT SUM(salary)/COUNT(*) FROM Employee
If you think they are always equivalent, justify your answer. If you
think they are not always equivalent, give the simplest example data
for Employee you can think of for which you expect the two
queries to yield different results.
Problem 3
Recall from Problem 3 in Assignment
#3 the relation Flight(city1,city2,cost,airline),
containing nonstop flights from one city to another. We were
interested in part (c) of that problem in writing a single SQL query
that would return the cheapest cost of flying between each pair of
cities, regardless of the number of stops. While you may have had
difficulty writing such a query in SQL2, it is possible to compute the
answer using SQL in conjunction with a programming language, and
that's your task in this problem. You may use SQL embedded within a
C-like language as presented in Section 7.1 of the textbook, you may
use a PL/SQL-like syntax, or you may use CLI/JDBC-style dynamic SQL.
Regardless, we are not particularly concerned with correct syntax, and
you can certainly "wing it" on syntax for things like checking for end
conditions. We're interested primarily in your overall algorithm, and
the interaction between programming constructs and SQL.
Problem 4 (Personal Database Application, Part 4)
In this part of the project, you will issue SQL queries and updates
against your PDA database, and you will experiment with the use of
indexes. Since you will be modifying your data as part of this
assignment, we strongly suggest that you adopt the routine for getting
repeated "fresh" starts with Oracle described in the Maintaining
your databases section of Assignment #3, if you haven't already.
(a) SQL
Develop and test:
- At least six SQL data retrieval (select) commands.
- At least two each of the four types of SQL data modification
commands: insert a single tuple, insert a subquery,
delete, update.
Please note:
- For this assignment you will be invoking your SQL commands
interactively through sqlplus, as described in the document Getting
Started With Oracle. Of course you should certainly build a
script file, rather than typing in the queries each time you run them.
- Please write "interesting" queries. You should try to use
most or all of the SQL constructs discussed in class and in the
textbook (subqueries, aggregates, set operators, etc.). You will not
receive full credit if your queries and modifications are all extremely
simple.
- We suggest that you experiment with your SQL commands on your
small hand-created database before running them on the large database
for which you generated data. Initial debugging is much easier when
you're operating on small amounts of data. Once you're confident that
your commands are working, try them on your large database. We do
expect that the commands you turn in work properly on both your small
and large databases.
- If you discover that most or all of your "interesting" queries
return an empty answer on your large database, check whether you
followed the instructions in PDA Part 3 for generating data values
that join properly. You may need to modify your data generator.
- Turn in a copy of all of your SQL commands, along with a
script illustrating their execution on either your small or large
database. Your script should be sufficient to convince us that your
commands run successfully, but you can and should truncate query
results after a few lines. Please do not turn in query results that
are hundreds of lines long. (Your submission may be rejected anyway
if you try, as explained in the electronic
submission instructions.)
(b) Indexes
In part (a) you may have discovered
that some queries run very slowly over your large database. As
discussed in class, an important technique for improving the
performance of queries is to create indexes. An index on an
attribute A of relation R allows the database to
quickly find all tuples in R with a given value for attribute
A (which is useful when evaluating selection or join
conditions involving attribute A). An index can be created
on any attribute of any relation, or on several attributes combined.
The syntax for creating indexes in Oracle is given in the document Oracle
8i SQL. Please pay careful attention to the information about
Stanford/Oracle-specific syntax.
Create at least three useful indexes for your PDA. Run your
queries from part (a) on your large database with the indexes and
without the indexes. Turn in a script showing your commands to create
indexes, and showing the relative times of query execution with and
without indexes. Here too, please truncate any large query
results.
Please note:
- As mentioned in the document Oracle
8i SQL, Oracle automatically creates indexes for attributes
declared as keys. Thus, when you created a relation in PDA Part 3 and
declared a PRIMARY KEY, an index was created, and
you need not attempt to create another index on your key attributes.
- As described in the document Oracle
8i SQL, in order to set the system to show query execution times
you must issue the command "set timing on;" at the
sqlplus prompt.
- Your timings will be affected by external factors such as
system load. However, for some of your queries, with appropriate
indexes you should see a consistent dramatic difference between the
execution times with indexes and the times without. If others of your
queries do not show performance improvement even when relevant indexes
are created, please include a short note suggesting why this might be
the case. (If you are completely unable to find queries whose running
times are improved by creating indexes, you could try temporarily
removing your PRIMARY KEY declarations, so Oracle will not
create any indexes automatically.)
![](http://www-cs.stanford.edu/gifs/line.red2.gif)