CS145 - Spring 2004
Introduction to Databases

Challenge Problems #1 Due Tuesday April 13

Submission Instructions

Solutions should be typed and can be submitted in Word, pdf, postscript, or plain-text formats. For students who absolutely prefer hand-writing their solutions, solutions may be scanned in using the scanner in Meyer library or a personal scanner, then submitted as jpeg or another universally-readable, relatively compact format. If you are unsure about your submission format, please contact the staff.
Solutions must be submitted as a single file named challenge1.<ext>, where <ext> matches the file type (e.g., doc, pdf, ps, txt, jpg, etc.). Submit your solutions by executing the following script on one of the Leland machines:
```
  /afs/ir/class/cs145/bin/submit-challenge
```
Make sure you are in the directory on your Leland account that contains the solution file, then execute the script by typing "/afs/ir/class/cs145/bin/submit-challenge". After a few prompts, the script will copy your solution file (only if it is appropriately named) to our private submissions directory, along with a timestamp. You may resubmit your solutions as many times as you like, however only the latest file and the latest timestamp are saved, and those are what we will use for grading your work and determining late penalties. Submissions via email will not be accepted.
The on-time deadline is 11:59 PM on the due date. Submissions after the deadline but less than 24 hours late will be accepted but penalized 10%, and submissions more than 24 hours but less than 48 hours late will be penalized 30%. No submissions will be accepted more than 48 hours late since solutions may be made available at that time. Since emergencies do arise, each student is allowed a total of four unpenalized late days (four periods up to 24 hours each) for challenge problems together with programming work, although no single assignment may be more than two days late. See the Assigned Work page for more information. Please remember that challenge problems must be submitted electronically, and plan accordingly.

Guidelines for Typing Assignments

Please put your name, preferred email address, and student ID number clearly at the start of the assignment.

Use common sense to make your submission as readable as possible: Carefully number each problem, put enough space between problems, avoid lots of typos, and try to be clear and succinct. If you have a series of equations, put space between the text and the equations, and indent. We're not expecting you to spend a lot of time making your assignments utterly beautiful, but we do need to be able to read your solutions easily in order to grade them.
For relational algebra, if you don't have the appropriate symbols in your formatting program, you may use "SELECT", "PROJECT", "JOIN", "INTERSECT", and "RENAME". You may also use brackets ("[]") in place of subscripting if necessary. Here's an example of a flat text relational algebra expression:
```
RENAME[NewName(NewID)](PROJECT[ID]((SELECT[Age<18](Students))
                                    JOIN[Students.ID = Classes.ID]
                                   (SELECT[Time>1200](Classes))))
```
Needless to say, we prefer the correct symbols and subscripting if possible.
If you want to include figures (unlikely, but a possibility) there are two options: Create them in the text formatting program you are using, or hand-draw them and scan them in. If you do scan figures, make sure to be very clear about the correspondence between figures and the problem solutions they are part of.

Honor Code reminder

For more detailed discussion of the Stanford Honor Code as it pertains to CS145, please see the Assigned Work page under Honor Code. In summary: You must indicate on your written and programming assignments any assistance (human or otherwise) that you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code. In any event, you are responsible for understanding and being able to explain on your own all material that you submit.

Partners are not permitted. Problems must be completed individually.

The Problems

This problem explores how to systematically store relational data in XML. We will use as examples two relations:
```
  Prof(name, phone, email, dCode)
  Dept(code, name, building, numFac)
```
We will use the following sample data for these relations.
In relation Prof:
```
  (Widom, 123-4567, widom@cs, CS)
  (Widom, 123-4567, widom@ee, EE)
  (Ullman, 987-6543, ullman@cs, CS)
```
In relation Dept:
```
  (CS, Computer Science, Gates, 40)
  (EE, Electrical Engineering, Packard, 55)
  (ME, Mechanical Engineering, MERL, 45)
```
Do not make any assumptions about the data except as stated in the questions.
(a) Specify a single DTD that allows data from any relational schema (one or more relations) to be stored as an XML document valid with respect to that DTD. (That is, the element structure of the XML will not be derived from the relational schema, since it needs to work for every schema.) Show how the sample data above would be encoded in XML corresponding to your DTD.
(b) Specify an algorithm for translating a relational schema (one or more relations) into a single XML DTD, where in this case the element structure (i.e., the tags) in the XML can reflect the relational schema. Show how the sample data above would be encoded in XML corresponding to your DTD. You may specify your algorithm using any kind of high-level pseudocode that you like.
(c) Suppose you have the additional information that every value appearing in attribute Prof.dCode appears exactly once in Dept.code. You don't have to modify your translation algorithm, but illustrate what might be considered a "better" XML structure for the sample data than the structure produced by your algorithm in (2).
Consider the same relational schema as in Problem 1:
```
  Prof(name, phone, email, dCode)
  Dept(code, name, building, numFac)
```
For this problem you may assume that (name,dCode) is a key for relation Prof, and code is a key for relation Dept.
For each of the following queries, either write the query as a relational algebra expression or explain why it cannot be written in relational algebra.
(a) Find the names of all departments where the number of professors listed for that department does not match the value in attribute numFac for that department.
(b) Assuming the numFac attribute value is correct (i.e., ignoring the Prof relation altogether), find the name of the department with the smallest number of faculty. In the case of ties, return names for all of the smallest departments.