CS145 - Spring 2002
Introduction to Databases

Assignment #2 -- Due Wednesday April 24

The procedure for turning in this assignment and the late policy are exactly the same as for Assignment #1.
The grading (and non-grading) of exercises and challenge problems will also proceed as in Assignment #1.
Note about SQL on Oracle: If you decide to test your solutions to the SQL written exercises using Oracle, note that Oracle's implementation of SQL differs somewhat from the SQL standard as covered in the textbook and class. These differences are outlined in the document Oracle 9i SQL, available through the course Web site Project page.

Exercises

In the following textbook exercises we'll use computers this time instead of battleships. For a complete description of the schema please see Exercise 5.2.1 on page 207. Here are a few additional clarifications:
- The "manufacturer" is what's stored in the attribute called maker.
- In each of the four relations, attribute model is a key, and there are no other keys.
- You may assume that attribute rd in relation PC is a string containing both the speed (e.g., 12x) and type (CD or DVD) of the removable disk.
Note that altogether this problem asks for a gigantic number of SQL queries. Feel free to pick and choose - you will get full credit if you write a reasonable fraction and variety of the queries. If you do them all, you will be a true SQL expert.
(i) Do parts (c) and (d) of Exercise 6.1.3 in the textbook (pages 252-253). Write the SQL queries only - you do not need to show the results on the sample data although you are free to do so if you wish. Note for part (c) that "any PC" should be taken to mean "some PC."
(ii) Do parts (c) and (e) of Exercise 6.2.2 in the textbook (pages 262-263). Write the SQL queries only - you do not need to show the results on the sample data although you are free to do so if you wish.
(iii) Do parts (b), (c), and (d) of Exercise 6.3.1 in the textbook (page 274).
(iv) Do Exercise 6.4.3 in the textbook (page 284) for your six queries (three parts, two queries each) from part (iii). We suggest that you do the "if so" part of the problem (rewriting to eliminate duplicates) but not the "if not" part of the problem (rewriting to eliminate subqueries), unless you are so inclined. A further exercise is to rewrite the queries using aggregation.
(v) Do parts (b), (d), and (g) of Exercise 6.4.6 in the textbook (pages 284-285). Write the SQL queries only - you do not need to show the results on the sample data although you are free to do so if you wish.
(vi) Do parts (b), (d), and (g) of Exercise 6.5.1 in the textbook (pages 290-291). Write the SQL modifications only - you do not need to describe the effects on the sample data although you are free to do so if you wish.
Consider the XML data in Figure 4.23 on page 182 of the textbook. Your queries should work on any XML data conforming to the DTD given in Figure 4.22 on page 181.
(a) Write an XPath expression to find the names of all stars whose street or city is "Hollywood."
(b) Write an XPath expression to find the titles of all movies made after 1975.
(c) Write an XPath expression to find all stars whose second listed movie has the word 'Strikes' in its title.
(d) Write an XQuery expression to find all pairs of star names where the two stars acted in a movie together. Assume that a movie is uniquely identified by its title. The elements in your query result should be of the form:
```
  <Pair>
    <Name1>...</Name1>
    <Name2>...</Name2>
  </Pair>
```
Stars should not be paired with themselves (you may assume that star names are unique), and each pair should occur exactly once, not once in each order.
(e) Write an XQuery expression to find the names of all stars who made at least one movie in a year earlier than the average year in the database.

Challenge Problems

Specify a real-world relational schema (i.e., not just "R" and "S") and then write a SQL query Q over the relations such that:
- Q has one EXISTS or IN subquery in its WHERE clause (in its "positive" version, i.e., do not use NOT EXISTS or NOT IN).
- Q cannot be rewritten as an equivalent query Q' without a subquery such that Q' is guaranteed to always return the same number of duplicates in the result as Q, even when Q' includes the keyword DISTINCT.
Your query Q does not need to use aggregation or any other "fancy" SQL constructs.
Consider a relational schema Student(ID,GPA), where ID is a key. Are the following two queries always equivalent? Either way, justify your answer.
```
  select avg(GPA) from Student
  select sum(GPA)/count(*) FROM Student
```
We didn't cover these features in class, but both XPath and XQuery have a special syntax for "dereferencing" IDREF attributes. In other words, when specifying an XML traversal in XPath or XQuery, there is a way to traverse through an IDREF attribute @A to the element E that A points to, and continue traversing from E. Suppose neither XPath nor XQuery provided such a feature. Is it possible to "emulate" IDREF dereferencing behavior in XPath using other constructs in the language? How about in XQuery? If it helps, you may assume there is a known DTD for the XML data being queried.

Project Part 2

Partners

Please remember the ongoing rule about partnerships: If a student turns in any part of the project as part of a team, every later part of the project must be submitted individually or as part of the same team. Details of partnership rules can be found in the Project page.

Part A

Copy the Oracle load file /usr/class/cs145/sample_data/Grades.txt into your own filespace. Based on this load file, create a table called Grades in Oracle into which you will load the data. The schema you should use for the table is:

  Grades(name CHAR(15), course CHAR(9), year CHAR(2), quarter CHAR(6), grade CHAR(2))

Load the data file into your Oracle table. Also make sure your Courses table from Project Part 1 is created and loaded as part of the same database. Now you have two tables with an attribute course that joins them. Using sqlplus, execute a few SQL queries and modifications over these two tables. You will want to refer to the document Oracle 9i SQL (available through the course Web site Project page) for details of Oracle's SQL quirks.

Create a transcript that shows the successful creation and loading of the Grades table, as well as the execution of at least two SQL queries involving both tables, and one SQL modification (insert, delete, or update) on each table.

Part B

In this part of the project you will considerably extend your warm-up work from Part 1 - the simple Web interface and single simple SQL query issued from the interface to Oracle. While in the first part of the project you needed to know only a little about Pro*C (for C users) or JDBC (for Java users), now you will be making more extensive use of these languages. You will also continue to use CGI or Java Servlets as in the first part of the project.

In addition to the various help documents and Web sites mentioned in Assignment #1, you will find the following documents useful for this part of the project:

Introduction to Pro*C Embedded SQL for C users
Introduction to JDBC for Java users

Both of these documents are available through the course Web site Project page. They provide a number of important details, including information specific to our computing environment and pointers to suites of sample programs.

You may continue to use the small database we are providing, with the Grades table in addition to the Courses table enabling somewhat more complex queries. (If you prefer to create and use a different database you are welcome to do so.) Your extended Web front-end and connection to Oracle should include at least the following features:

One or more input boxes, where the value typed by the user becomes a parameter to a SQL query issued to Oracle (with results displayed in the Web page as usual).
One or more drop-down menus, where again the value selected by the user becomes a parameter to a SQL query issued to Oracle. If you wish you may use both input boxes and menus to gather parameters for a single query.
The ability to modify the database in some parameterized fashion through the Web interface (inserts, deletes, or updates with input from the user).
N-at-a-time browsing , where the first N tuples in a query result are displayed, then the user must request the next N to be displayed, etc. (This behavior is similar to search engines, and to many other search features on the Web that return more than a few results.) It's up to you whether to support previous-N as well as next-N; the former is not required. It's also up to you whether N is hard-coded or a parameter that can be input by the user.
Link-based browsing, where a result page presented by your program contains hyperlinks on certain attribute values (typically join attributes). Clicking on a hyperlink should issue another SQL query related to the item being clicked on, and generate another result page in HTML as usual. Here's an example: Suppose a query returns a subset of the rows in the Courses table, displayed in HTML. The values in column courses can be displayed as hyperlinks, where clicking on one of these links issues a query to find all entries in the Grades table for that specific course.

The following Web pages have been created with additional hints specific to this programming assignment:

Hints for Project Part 2, C users (http://www.stanford.edu/class/cs145/hw/hw2hint.html)
Hints for Project Part 2, Java users (http://www.stanford.edu/class/cs145/hw/hw2JavaNotes.html)

Note on maintaining your databases

You should consider two factors in the maintenance of your databases throughout the CS145 project:

Oracle is not backed up, so anything you need long-term should be saved in the leland file system.
As you start modifying data in a database, you will undoubtedly want the ability to get a "fresh start" easily from your original data.

We recommend that you establish a routine of saving all data in Oracle load files, and perhaps reloading the database each time you start working with it. Remember to delete the contents of each table (or destroy and recreate the tables) before reloading. Otherwise, unless you take APPEND out of your control file (or there is a declared key), Oracle will happily append the new data to your old table, causing your table size to double, triple, quadruple, etc. To get rid of a table called T, issue the following command to sqlplus:

  drop table T;

If you want to get rid of all tuples in T without deleting the table itself, issue the command:

  delete from T;

Browser compatibility

As in Project Part 1, please ensure that your Part B Web interface operates correctly using the Netscape browser on the Sweet Hall Solaris workstations. If there is a compelling reason you cannot make your Web interface work in the Solaris Netscape browser environment you must get "preapproval" from the course staff to use a different browser environment. Send an email message to cs145-staff@cs telling us precisely what browser environment you wish to use. The message must be sent by Sunday April 21 so that we have time to work things out if your browser environment poses a problem for us. You will receive a reply within 24 hours of your message, and you do need to receive a positive confirmation message before assuming that your alternate environment is okay.

When the preapproval process is not followed, projects that have problems on the Sweet Hall Solaris Netscape browsers may lose points, possibly all points if we cannot run your Web interface at all.

What to submit

Create a submission directory exactly as you did for Project Part 1. The directory should contain a text file called README with the Part A transcript, and for C users with a URL for the .html file where a grader can test the project. In addition, the submission directory should contain the following.

For C users, all source files that you created or modified, including the Makefile if you modified it and all relevant .html files.
For Java users, your entire servletdir directory, including all static .html documents in the webpages subdirectory and the Java Servlet code in the servlets subdirectory.

Once your submission directory is created, submission instructions are exactly the same as for Project Part 1. Remember that points may be deducted if you do not follow the submission procedures exactly as specified, and if you submit more than once, all submissions except your last will be ignored.