CS145 - Spring 2003
Introduction to Databases
Challenge Problems #5   --   Due 11:59pm, Tuesday June 3rd
Email to cs145-submit@cs.stanford.edu

Challenge Problems

  1. Statistical authorization

    There's a type of database security that was not covered in class or in the textbook called "statistical authorization". With statistical authorization, some users may be permitted to retrieve only aggregate information from the database, e.g., average salaries but not individual salaries. Furthermore, to prevent users from applying aggregates to very small numbers of tuples (such as the average of one salary!), the system requires that a certain minimum number of tuples contribute to each aggregate result. Finally, to prevent the user from using intersection properties to deduce a single value (e.g., the user could ask for X=sum(A1,A2,A3,A4,A5), then ask for Y=sum(A2,A3,A4,A5), then compute X-Y to deduce the value of A1), the system may require that the tuples participating in multiple queries issued by the same user have a small intersection. In this problem you will explore how, even with such security measures, specific information can still be deduced from the database.

    Here's the problem. Consider the simple relation Student(ID,GPA), where ID is a key. Suppose that, for security reasons, the following restrictions are made on user U's set of queries against this relation:

    Give a set of queries that satisfies the above restrictions, and from whose answers you can determine the GPA of the student with a given ID I. You may assume that I is in the range 1-10, there are 50 students with IDs in the range 1-50, and attribute GPA is of type float. Write the queries in SQL, then show the computation that produces the GPA for the student with ID=I from the query results. Use as few queries as you can.


  2. Object-relational to pure relational

    The goal of this problem is to specify a mapping from object-relational SQL to regular SQL.

    (a) Consider object-relational schemas that allow REFs, and consider the corresponding queries with dereferencing. Suggest a method for translating schemas with REFs in them into schemas without REFs. (Yes, you may make assumptions about keys, and yes, this part is very easy.) Then suggest a method for translating queries that use dereferencing over the original schema into queries without dereferencing over the new schema.

    (b) Consider object-relational schemas with nested structures, and consider the corresponding queries with extra dots. Suggest a method for translating schemas with nested structures in them into schemas without nested structures. (If it helps you may make assumptions about keys.) Then suggest a method for translating queries that use nested dots over the original schema into queries without nested dots over the new schema. Note that in order to keep the issues separate, you may assume for this part that schemas and queries do not include references/dereferencing.

    (c) Are there any subtleties involved in combining your solutions for (a) and (b) into a complete translation scheme?


  3. Temporal relational algebra

    Consider a relation Visit(candidate,state,dates), which records visits made by candidates to different states. Visit is a temporal relation, with timestamp attribute dates specifying all intervals of time during which the candidate was visiting the state. You may assume the granularity of time is one day. By definition of temporal relations, (candidate,state) together form a key for the relation. Write an expression in temporal relational algebra that finds all candidates who visited two or more different states on the same day within the last 30 days.


  4. Association rules in SQL

    Consider a relation Likes(student,song). This relation contains a variant of the "market basket" data discussed in class - you can think of a student as representing a "transaction" (in the shopping checkout sense, not the database sense), and think of the songs liked by that student as the "items purchased in the transaction." Attributes [student,song] together form a key for this relation.

    We want to learn about correlations between songs liked by students. Specifically, we want to mine from this relation all association rules with certain support and confidence thresholds (as defined in class). To keep things simple, you only need to find rules with exactly two items (songs) on the left-hand side.

    Specifically, write a SQL query that produces all triples of distinct songs [S1,S2,S3] such that {S1,S2} -> S3 is an association rule with support > 0.02 and confidence > 0.6. If you find it useful to define a sequence of one or more views used by the final query you may do so.