Assignment #4
Due Wednesday October 30

Important note: We plan to distribute a sample solution to this assignment as soon as possible in order to help students prepare for the midterm exam. Thus, we will not accept any late submissions for this assignment. On-campus assignments must be submitted by Wednesday at 5:00 PM. SITN assignments must be sent with the Thursday courier at the latest.
  1. Consider the following relational database schema, which you also used in Assignment 3:
       lives (name, street, city)
       works (name, company, salary)
       located (company, city)
       manages (name, manager-name)
    You may make the following assumptions for this problem:

    (A1)
    Attributes lives.name, works.name, and manages.name are keys for their relations. For relation located, the key is < company,city >, i.e., a company may be located in many cities.

    (A2)
    Every person in the works relation also appears in the lives relation, but not necessarily vice-versa (i.e., noone works without living, but people may live without working).

    Write SQL queries for each of the following:

    (a)
    Find the names of all people who do not work for IBM.

    (b)
    Find the names of all people who earn a salary that is higher than the salary earned by any person who works for IBM.

    (c)
    Find all companies that have locations in every city in which IBM has a location.

    (d)
    Find the highest salary of any person who works for a company that has a location in Palo Alto.

    (e)
    For each manager, find the lowest salary of any person who is managed by that manager.

    (f)
    For each city, find the number of people who live in that city and work for some company that has a location in that city.

    (g)
    For each city, find the number of people who live in that city but do not work for any company that has a location in that city.

    (h)
    Find the names of all people who earn more than the average salary of all other people working for the same company.

    (i)
    Find the company with the largest number of employees.

    (j)
    Find the company with the smallest total salary paid to its employees.

    (k)
    Find all companies that pay more, on average, than the average salary paid by IBM.

  2. Continuing with the schema from Problem #1, write data modification commands in SQL for the following. Note that some parts may require more than one command to perform the desired modification.

    (a)
    Insert a new company called ``MBI'' in Palo Alto and a new person ``Mary'' who works for MBI and earns $50,000. You may assume that Mary already exists in the lives relation.

    (b)
    Insert into a new rich_manager(name,salary) relation thename and salary of every person who is a manager and earns more than $100,000. You may assume that relation rich_manager has already been created.

    (c)
    Delete from the works relation every person who works for a company that has a location in Palo Alto.

    (d)
    Modify the database so person ``Mary'' now lives on Stanford Avenue in Palo Alto.

    (e)
    Give all people who work for IBM a 10% raise.

    (f)
    Give all managers a 20% raise.

    (g)
    Give all managers a 10% raise except when the manager's salary would be greater than $100,000 after the raise. In such cases, give only a 3% raise.

  3. Consider the following 29 SQL query forms. You may assume that Rlist1 and Rlist2 are disjoint lists of relations, and that all attribute names are unique across the entire database. Note that cond2 may include references to attributes from relations in Rlist1, i.e., ``correlated'' references to relations outside of the subquery.

    (1)plain select Alist from Rlist where cond
    (cond contains no subqueries)
    (2)in select Alist from Rlist1
    where cond1 and A1 in (select A2 from Rlist2 where cond2)
    (3)not in select Alist from Rlist1
    where cond1 and A1 not in (select A2 from Rlist2 where cond2)
    (4)exists select Alist from Rlist1
    where cond1 and exists (select * from Rlist2 where cond2)
    (5)not exists select Alist from Rlist1
    where cond1 and not exists (select * from Rlist2 where cond2)
    (6)= all select Alist from Rlist1
    where cond1 and A1 = all (select A2 from Rlist2 where cond2)
    (7)not = all select Alist from Rlist1
    where cond1 and not A1 = all (select A2 from Rlist2 where cond2)
    (8) <> all select Alist from Rlist1
    where cond1 and A1 <> all (select A2 from Rlist2 where cond2)
    (9)not <> all select Alist from Rlist1
    where cond1 and not A1 <> all (select A2 from Rlist2 where cond2)
    (10)< all select Alist from Rlist1
    where cond1 and A1 < all (select A2 from Rlist2 where cond2)
    (11)not < all select Alist from Rlist1
    where cond1 and not A1 < all (select A2 from Rlist2 where cond2)
    (12)<= all select Alist from Rlist1
    where cond1 and A1 <= all (select A2 from Rlist2 where cond2)
    (13)not <= all select Alist from Rlist1
    where cond1 and not A1 <= all (select A2 from Rlist2 where cond2)
    (14)> all select Alist from Rlist1
    where cond1 and A1 > all (select A2 from Rlist2 where cond2)
    (15)not > all select Alist from Rlist1
    where cond1 and not A1 > all (select A2 from Rlist2 where cond2)
    (16)>= all select Alist from Rlist1
    where cond1 and A1 >= all (select A2 from Rlist2 where cond2)
    (17)not >= all select Alist from Rlist1
    where cond1 and not A1 >= all (select A2 from Rlist2 where cond2)
    (18)= any select Alist from Rlist1
    where cond1 and A1 = any (select A2 from Rlist2 where cond2)
    (19)not = any select Alist from Rlist1
    where cond1 and not A1 = any (select A2 from Rlist2 where cond2)
    (20)<> any select Alist from Rlist1
    where cond1 and A1 <> any (select A2 from Rlist2 where cond2)
    (21)not <> any select Alist from Rlist1
    where cond1 and not A1 <> any (select A2 from Rlist2 where cond2)
    (22)< any select Alist from Rlist1
    where cond1 and A1 < any (select A2 from Rlist2 where cond2)
    (23)not < any select Alist from Rlist1
    where cond1 and not A1 < any (select A2 from Rlist2 where cond2)
    (24)<= any select Alist from Rlist1
    where cond1 and A1 <= any (select A2 from Rlist2 where cond2)
    (25)not <= any select Alist from Rlist1
    where cond1 and not A1 <= any (select A2 from Rlist2 where cond2)
    (26)> any select Alist from Rlist1
    where cond1 and A1 > any (select A2 from Rlist2 where cond2)
    (27)not > any select Alist from Rlist1
    where cond1 and not A1 > any (select A2 from Rlist2 where cond2)
    (28)>= any< select Alist from Rlist1
    where cond1 and A1 >= any (select A2 from Rlist2 where cond2)
    (29)not >= any select Alist from Rlist1
    where cond1 and not A1 >= any (select A2 from Rlist2 where cond2)

    Now here's the problem:

    (a)
    For each query type except (1), if you can write an equivalent query using one of the other query types, show the equivalent query. For this problem two queries are considered equivalent if they always produce the same of tuples, even if the number of appearances of each tuple may be different. That is, we ignore the issue of duplicates.

    (b)
    Give a set of query types that is sufficient to express queries equivalent to all 29 query forms. A set of query types is minimal if taking any type out of the set would produce a strict loss in expressive power, i.e., not all 29 query forms would be expressible. Include query type (1) in your minimal set. Again use the set-based equivalence defined in part (a), where issues of duplicates are not considered.

  4. Personal Database Application (PDA)

    This week you will have the pleasure of running queries over the data in your PDA.

    (a)
    Develop and test:

    1. At least five SQL data retrieval (select) commands.

    2. At least one of each of the four types of SQL data modification commands: insert a single tuple, insert a subquery, delete, update.

    There are a number of important things to keep in mind:

    • For this assignment you will be invoking SQL commands interactively through Getting Started with Sybase. Of course you should certainly use a script (file), rather than typing the queries in each time you run them. In future assignments you will be embedding SQL commands in a C/C++ application program.

    • Please write ``interesting'' queries. You should try to use most or all of the SQL constructs discussed in class and in the course notes (subqueries, aggregates, etc.). You will not receive full credit if your queries and modifications are extremely simple.

    • It is strongly suggested that you experiment with your SQL commands on a small database (e.g., your hand-created database), before running them on the large database for which you generated data. Initial debugging is much easier when you're operating on small amounts of data. Once you're confident that your commands are working, run them on your complete database.

    • If you discover that most or all of your ``interesting'' queries return an empty answer on your large database, then you probably didn't follow the instructions in Assignment #3 for generating data values that join properly. You will need to modify your data generator accordingly.

    • Turn in a copy of all of your SQL commands, along with a script illustrating their execution. Your script should be sufficient to convince us that your commands run successfully. Please do not, however, turn in query results that are thousands (or hundreds of thousands) of lines long!

    (b)
    In part (a) you probably discovered that some queries run very slowly over your large database. As discussed in class, one principal technique for improving the performance of queries is to create indexes. An index on an attribute A of relation R allows the database to quickly find all tuples in R with a given value for attribute A. An index can be created on any attribute of any relation. The syntax for creating an index in Sybase is:

    create index index-name on relation-name (attribute-name)

    If you know that the attribute is a key, you can speed things up even more by declaring the index as follows:

    create unique index index-name on relation-name (attribute-name)

    You can get rid of an index using the command:

    drop index index-name K

    Indexes can be created on lists of attributes as well as on single attributes. Create at least two useful indexes for your PDA. Run your queries from part (a) on your large database with the indexes and without the indexes. Note that each time you run an SQL command in Sybase, afterwards you get statistics about how long it took to execute the command. Naturally these times may be affected by external factors such as system load, etc. Still, you should see a dramatic difference between the execution times with indexes and the times without. Turn in a script showing your commands to create indexes, and showing the relative times of query execution with and without indexes.


The TAs of CS145, cs145ta@cs.stanford.edu, last modified: 10/23/96