|
|
Advised(Advisor, Student, Year) // Student is a keyA tuple (A,S,Y) in Advised specifies that advisor A advised PhD student S who graduated in year Y (and then may have become a professor and advised his or her own students).
In one statement of SQL-99 using the recursive WITH construct, write a query that finds all "descendants" of Widom, i.e., all students whose advisor was Widom, or whose advisor's advisor was Widom, or whose advisor's advisor's advisor was Widom, and so on. For each descendant, return the Student attribute along with the graduation Year attribute.
Flight(orig, dest, airline, cost) // all four attributes are the only key; cost > 0containing nonstop flights from one city (orig) to another (dest). We were interested in finding the cheapest cost of flying from city A to city B, regardless of the number of stops. We observed that the query could not be written in SQL2.
(a) Write the query in one statement of SQL-99, using the recursive WITH construct. You may assume that A and B are constants that you may use anywhere in the query.
(b) Does your query in part (a) take full advantage in the WITH clause of knowing cities A and B, i.e., does it use one or both of them to "limit" the recursion to the relevant data? If not, modify your query to take advantage of A and/or B to the extent possible.
(c) Now write a single recursive SQL-99 statement that computes the cheapest cost of flying between all pairs of cities, regardless of the number of stops. The result of the query should have schema (orig,dest,cost) where [orig,dest] together form a key. (The exact attribute names are not important.)
(d) Can your query for part (c) return tuples where orig and dest are the same city? If so, what is the cheapest cost of traveling from a city to itself? Modify your solution to part (c) so that no such tuples appear in the query result.
CREATE TYPE AddressType AS (street char(50), city char(50), zip integer) CREATE TYPE ScoresType AS (GPA float, SAT integer) METHOD composite() RETURNS float CREATE TYPE StudentType AS (ID integer, name char(30), address AddressType, scores ScoresType) CREATE TABLE Student OF TYPE StudentType (PRIMARY KEY (ID))
(a) Write a query that returns the zipcodes (no duplicates) of all students with composite scores greater than 100.
(b) Write a query that returns all pairs of students by their ID's such that the students come from the same city and have identical scores. Do not return students with themselves, and do not return pairs and their inverses.
(c) Add to the schema above the statement:
CREATE ORDERING FOR ScoresType EQUALS ONLY BY STATEDoes this change your solution to part (b)?
CREATE TABLE Dating (boy STRING, girl STRING, time TIMESTAMP)
Recall that TIMESTAMP values are sets of disjoint intervals, and that NOW may be used as an interval endpoint. You may assume that names are unique (i.e., the same name indicates the same person), and that (boy,girl) is a key for this relation. Write the following queries using the temporal relational algebra introduced in class. You may call built-in functions and test conditions on time values as illustrated in class.
(a) Find all boy-girl pairs who are dating each other now.
(b) Find all boy-girl pairs who are dating each other now and have dated each other for a total of at least 6 months.
(c) Find all boys who have dated more than one girl at the same time.
(d) Find all girls who have been dating someone (not necessarily the same person) for a total of at least 2 years.
Scores(studentID, quarter, courseID, exam#, score)
Consider the following two queries over the Scores relation. (Average scores obviously would make more sense than sums of scores, but we're intentionally keeping the problem simple.)
// Total score grouped by quarter and course: SELECT quarter, courseID, SUM(score) FROM Scores GROUP BY quarter, courseID // Total score grouped by course and exam# SELECT courseID, exam#, SUM(score) FROM Scores GROUP BY courseID, exam#
Specify a view V over the Scores relation. You should choose V so that if V is stored as a materialized view, then V can be used to substantially speed up the execution of both of the above queries, assuming that the Scores relation is very large. In addition to specifying V, show how each of the two queries above can be rewritten into an equivalent query that uses V instead of Scores.
We want to learn about correlations between courses taken. Specifically, we want to mine from this relation all single-attribute association rules with certain support and confidence thresholds (as defined in class). Specifically, write a SQL query that produces all pairs of distinct courseID's [C1,C2] such that C1->C2 is an association rule with support > 0.05 and confidence > 0.6. If you find it useful to define a sequence of one or more views used by the final query you may do so.
|
(a) Specify a schema (including any keys) for a "normal" (nontemporal) relation R that can encode the same data as in relation T. Explain how a set of tuples in T would be translated to a set of tuples in R.
(b) Can you translate a temporal relational algebra projection over temporal relation T into a SQL query over nontemporal relation R? That is, suppose you wish to execute PROJECT_Ai(T) (with its temporal interpretation) by translating it into a SQL query over the corresponding data in relation R. Either show the translation or explain why it can't be done.
(c) Same as part (b), except consider the temporal join of two temporal relations T1(A,B,ts) and T2(B,C,ts). Assume that the temporal relations are represented as two nontemporal relations, R1 and R2, using the same encoding you gave in part (a). You wish to execute T1 NATURAL-JOIN T2 (with its temporal interpretation) by translating it into a SQL query over R1 and R2. Either show the translation or explain why it can't be done.
(a) Suppose F has five dimension attributes with two values each, and all combinations of dimension values appear in F. Let F have one dependent attribute that is summed in the CUBE tuples representing aggregation. How many tuples are there in F? How many in CUBE(F)?
(b) Now suppose F has N dimension attributes but the values are very sparse, i.e., not nearly all combinations of dimension values are represented in F (but assume F is not empty). Consider the ratio of the size of CUBE(F) to the size of F. How large can that ratio be?
(a) Write the new SQL query. If you find it useful to define a sequence of one or more views used by the final query you may do so.
(b) Are there any principles you can use to "prune" the search, i.e., so you don't need to consider all possible rules with one or two distinct attributes on the left-hand side and one on the right-hand side? Don't worry about incorporating the pruning into your SQL code, just describe it in English.