CS145 - Introduction to Databases
Spring 2001, Prof. Widom
Assignment #5
Due Wednesday May 16
- There are two important documents relevant to the PDA part of this
assignment:
For the JDBC document, an updated version from the one in the course
reader and on the book Web site is now available via the course home
page (or following the link above) - please make sure to use the newer
version. The Pro*C document is current in your course reader and from
the book Web site, linked from the course home page under Oracle
Information.
- Turning in written exercises: The procedure for turning
in the written exercises on this assignment is the same as for the
previous assignments, with the same deadline and late policy.
- Turning in PDA: Your PDA work will again be submitted
electronically. The procedure is the same as for the previous
electronic submissions (see http://www.stanford.edu/class/cs145/submit-README.txt),
and the same deadline and late policy applies. Reminder: please do
not submit any files containing large data sets or query
results. All results should be truncated to a few lines. Large
submissions will be rejected, either automatically or by a (not too happy)
grader.
THE ASSIGNMENT
Problem 1
Consider the following four relations:
EmpDept(empID,deptNo) // empID is key
EmpOffice(empID,office) // empID is key
EmpPhone(empID,phone) // empID is key
DeptMgr(deptNo,mgrID) // deptNo is key
and the following view that joins all four relations:
create view AllInfo as
(select ED.empID, ED.deptNo, EO.office, EP.phone, DM.mgrID
from EmpDept as ED, EmpOffice as EO, EmpPhone as EP, DeptMgr as DM
where ED.empID = EO.empID
and EO.empID = EP.empID
and ED.deptNo = DM.deptNo)
(a) Write a SQL query to find the offices of all employees
whose department manager has ID 123. Do not use the view, and
eliminate duplicate offices in your result.
(b) Now write the same query using the view and not the base
relations. Again, eliminate duplicate offices in your result.
(c) Are the queries in parts (a) and (b) equivalent? If so,
briefly explain why. If not, give a simple counterexample consisting
of four relation instances, the view instance, and the two different
query results.
Problem 2
Suppose you are developing a database management system. So far, the
system has standard SQL authorization for tables and queries (as
covered in class and in the textbook), and it supports SQL virtual
views (as covered in class and in the textbook). Your job is to
implement authorization checking for operations related to views,
while introducing as little additional machinery as possible.
Specifically:
(a) What privileges should be required for a user
U1 to create a view? That is, using existing
authorization features, what can the system check before executing
the following statement issued by user U1?
CREATE VIEW V(A1, A2, ..., An) AS [SQL query]
(b) What privileges should be required for a user
U2 to refer to a view in a query? That is, using
existing authorization features, what can the system check before
executing the following statement issued by user U2?
SELECT [Select List] FROM ..., V, ... WHERE [Condition]
Note: This problem is more open-ended than most of your written
exercises, but it still has concise, correct solutions.
Problem 3
Consider a set of users U, V, W,
X, and Y. Suppose that user U creates a
relation R(A) and is thus the owner of relation R.
Now suppose the following set of statements is executed in order:
stmt | user | operation |
---|
1 | U | grant select on R to V,W with grant option |
2 | V | grant select on R to W |
3 | W | grant select on R to X,Y |
4 | U | grant select on R to Y |
5 | U | revoke select on R from V restrict |
6 | U | revoke select on R from W cascade
|
Show the grant diagram after steps 4, 5, and 6. Since all of the
nodes in the diagrams will be for privilege "select on R,"
you may omit writing it in each node.
Problem 4
There's a type of database security that was not covered in class or
in the textbook called "statistical authorization." With statistical
authorization, some users may be permitted to retrieve only aggregate
information from the database, e.g., average salaries but not
individual salaries. Furthermore, to prevent users from applying
aggregates to very small numbers of tuples (such as the average of one
salary!), the system requires that a certain minimum number of tuples
contribute to each aggregate result. Finally, to prevent the user
from using intersection properties to deduce a single value (e.g., the
user could ask for X=sum(A1,A2,A3,A4,A5), then ask for
Y=sum(A2,A3,A4,A5), then compute X-Y to deduce the
value of A1), the system may require that the tuples
participating in multiple queries issued by the same user have a small
intersection. In this problem you will explore how, even with such
security measures, specific information can still be deduced from the
database.
Here's the problem. Consider the simple relation
Student(ID,GPA), where ID is a key. Suppose that,
for security reasons, the following restrictions are made on user
U's set of queries against this relation:
- The result of every query must be a single aggregate
value - a SQL aggregate function applied to one of the attributes of
the relation.
- At least 4 different tuples must be used in the aggregate to
produce each query's result.
- For any two queries issued by user U, the sets of tuples
used to produce the two query results must have an intersection no
larger than 2.
Assume there are 50 students with ID's in the range 1 to 50,
and that attribute GPA is of type float. Give a set of
queries that satisfies the above restrictions, and from whose answers
you can determine the GPA of the student with ID=1. Write
the queries in SQL, then show the computation that produces the GPA
for the student with ID=1 from the query results. Use as few
queries as you can.
Problem 5 (Personal Database Application, Part 5)
This week you will use either Pro*C embedded SQL or JDBC dynamic SQL
to interact with your PDA database from an external program. Your
task is to build a moderately user-friendly interactive application
program front end to your PDA using the C, C++, or Java programming
language. Your program should consist of a continuous loop in which:
- A list of at least five alternative options is offered to
the user. (An additional alternative should be quit.)
- The user selects an alternative.
- The system prompts the user for appropriate input values.
- The system accesses the database to perform the appropriate
queries and/or modifications.
- Data or an appropriate acknowledgment is returned to the
user.
You should include both queries and modifications among your
options. As in your previous assignment, please include some
"interesting" queries or modifications, i.e., operations that require
some of the more complex SQL constructs such as subqueries,
aggregates, set operators, etc.
As a general example, if your PDA is a UC campus applicant database
like the one discussed in class, then your interface might include in
its menu:
- A number of useful queries on the database, with both input
and output in a format more convenient and pleasing than raw
interactive SQL. Some queries perform statistical analysis requiring
multiple levels of grouping, other queries are simpler.
- Insert a new student record.
- Insert a new application record.
- Update a student's address, GPA, or SAT.
- Update an application's decision field based on user input.
- Update application decision fields automatically, admitting
students based on a combination of numeric values and quotas for each
campus and/or major.
- Quit.
Your application code should interact with the database using Pro*C
embedded SQL for C or C++ programs, or using the JDBC call-level
interface for Java programs. Please refer to the appropriate
document:Introduction
to Pro*C or Introduction
to JDBC. We are not expecting anything particularly fancy in
terms of the interface itself. (In fact, ultimately this interface
will be replaced by the Web interface you will create in the final
part of the project.) For example, in C a menu printed via
printf is fine. Also, handling of SQL errors can be quite
simple. You can write a routine that just prints the error message
from Oracle, or model your error handler after one of our sample
programs.
Submission
Please turn in your C, C++, or Java code along with a script showing
an interaction with your program. Each one of your options should be
exercised at least once in your script. The script may show your
program running over your small or your large database. However:
- If you use your small database, be sure to test your program on
your large database as well to make sure it is correct and reasonably
efficient.
- If you use your large database, please do not submit listings
showing hundreds of lines of query results.
As always, you should include comments for any program code, database
queries, or other operations that are not crystal clear, and it is an
Honor Code violation to edit scripts before turning them in (other
than simple formatting, comments, or truncation).