CS145 - Spring 2003
Introduction to Databases
Challenge Problems #1   --   Due 11:59pm, Tuesday April 15
Email to cs145-submit@cs.stanford.edu

Challenge Problems

  1. This problem explores how to systematically store relational data in XML. We will use as examples two relations:
      Student(ID, name, phone)  // ID is key
      Plays(ID, sport, season)  // (ID, sport) is key
    
    We will use the following sample data for these relations:

    In Student:

      (123, Susan, 123-4567)
      (456, Sam, 456-7890)
      (789, Samantha, 789-0123)
    

    In Plays:

      (123, soccer, fall)
      (123, basketball, winter)
      (789, soccer, fall)
    

    (a) Specify a single DTD that allows data from any relational schema (one or more relations) to be stored as an XML document valid with respect to that DTD. (That is, the element structure of the XML will not be derived from the relational schema, since it needs to work for every schema.) Show how the sample data above would be encoded in XML corresponding to your DTD.

    (b) Specify an algorithm for translating a relational schema (one or more relations) into a single XML DTD, where in this case the element structure (i.e., the tags) in the XML can reflect the relational schema. Show how the sample data above would be encoded in XML corresponding to your DTD. You may specify your algorithm using any kind of code or high-level pseudocode that you like.

    (c) Suppose you have the additional information that every value for attribute Plays.ID appears exactly once in Student.ID (but not necessarily vice-versa). You don't have to give a translation algorithm, but illustrate what might be considered a "better" XML structure for the sample data than the structure produced by your algorithm in (2).

  2. Consider the same schema used for the XML problem, with one additional attribute in relation Plays:
      Student(ID, name, phone)        // ID is key
      Plays(ID, sport, season, rank)  // (ID, sport) is key
    

    Write a relational algebra expression that returns the names of all students who play at least two sports in the same season and who have the highest rank in at least one of those sports. Do not assume there is a specific highest rank, such as 1. However, if it helps you may assume there are not ties in rank for a given sport. To keep your expression from getting too unwieldy you may use "S" for Student and "P" for Plays.