CS145 Lecture Notes (2) -- Data: Relations and XML



Terminology reminder

Relational Model versus XML: Fundamental Differences

  1. Relations: Schema must be fixed in advance
    XML: Does not require predefined, fixed schema

  2. Relations: Rigid flat table structure
    XML: Flexible hierarchical structure (also arbitrary graphs)

  3. Relations: Easy to understand, simple query language
    XML: Can be harder to understand, more complex query language (if any)












Example Applications

  1. University records: students, courses, grades, etc.
    Relations or XML? Why?
    
    
    
    
  2. University Web site: news, academics, admissions, events, research, etc.
    Relations or XML? Why?
    
    
    
    
  3. Family tree.
    Relations or XML? Why?
    
    
    
    
    
Application domain we use for example schemas and instances:

The Relational Model

Database = set of relations (or tables), each with a distinct name

(Example: Student, Campus)










Each relation has a set of attributes (or columns), with a distinct name within its relation.

(Example: label Student, Campus)








Each tuple (or row) in a relation has a value for each attribute.

(Example: some tuples for Student, Campus relations)








Each attribute has a type (or domain).
  ID:char(9), name:char(25), GPA:float, age:integer
We'll use atomic (indivisible) types only for now.


schema = complete description of structure of relations in database: relation names, attribute names, types, etc.
instance = actual contents (tuples) of relations

Specifying the schema (on paper):

Naming conflicts:
    Student(ID, name, GPA, age)
    Campus(name, enrollment)      <- duplicate use of "name" okay
    Apply(ID, Campus)             <- "ID" okay, "Campus" not okay
Null values: (Unknown age example)










Keys: A key for a relation is a set of attributes such that no two tuples can have the same values for all of their key attributes. (Example: keys for Student, Campus)





Relations in SQL



XML

If you're interested in reading about the XML standard in detail, visit:
   http://www.w3.org/TR/2006/REC-xml-20060816/
It will keep you busy for days. Less detailed references are linked from the course schedule and are required reading.


XML = Extensible Markup Language Like SGML and HTML, basic XML consists of three things:
  1. Tagged elements, which may be nested within one another
  2. Attributes on elements
  3. Text

In HTML, tags denote formatting: <Title>, <I>, <Table>, etc.

In XML, tags denote meaning of data: <Student>, <Book_Title>, etc.

XML data can be formatted using CSS (Cascading Style Sheets) or XSL (Extensible Stylesheet Language) to translate XML to HTML.








Well-formed XML

A well-formed XML document is any XML document that follows the basic rules: single root element, matched tags, unique attribute names, etc.









Example: bookstore data
   <?xml version="1.0" standalone="yes"?>
   <Bookstore>
      <Book ISBN="ISBN-0-13-035300-0" Price="$65" Edition="2nd">
         <Title>A First Course in Database Systems</Title>
         <Authors>
            <Author>
               <First_Name>Jeffrey</First_Name>
               <Last_Name>Ullman</Last_Name>
            </Author>
            <Author>
               <First_Name>Jennifer</First_Name>
               <Last_Name>Widom</Last_Name>
            </Author>
         </Authors>
      </Book>
      <Book ISBN="ISBN-0-13-031995-3" Price="$75">
         <Title>Database Systems: The Complete Book</Title>
         <Authors>
            <Author>
               <First_Name>Hector</First_Name>
               <Last_Name>Garcia-Molina</Last_Name>
            </Author>
            <Author>
               <First_Name>Jeffrey</First_Name>
               <Last_Name>Ullman</Last_Name>
            </Author>
            <Author>
               <First_Name>Jennifer</First_Name>
               <Last_Name>Widom</Last_Name>
            </Author>
         </Authors>
         <Remark>
         Amazon.com says: Buy this book bundled with "A First Course,"
         it's a great deal!
         </Remark>
      </Book>
   </Bookstore>
A well-formed XML document can contain regular data (as above) or very irregular data.

Valid XML

It's possible to define a kind of schema for XML data, called a Document Type Descriptor (DTD).

A DTD is a grammar that describes the legal attributes of tagged elements and the legal ordering and nesting of the elements.










Example DTD

   <!ELEMENT Bookstore (Book | Magazine)*>
   <!ELEMENT Book (Title, Authors, Remark?)>
   <!ATTLIST Book ISBN CDATA #REQUIRED
                Price CDATA #REQUIRED
                Edition CDATA #IMPLIED>
   <!ELEMENT Magazine (Title)>
   <!ATTLIST Magazine Month CDATA #REQUIRED Year CDATA #REQUIRED> 
   <!ELEMENT Title (#PCDATA)>
   <!ELEMENT Authors (Author+)>
   <!ELEMENT Remark (#PCDATA)>
   <!ELEMENT Author (First_Name, Last_Name)>
   <!ELEMENT First_Name (#PCDATA)>
   <!ELEMENT Last_Name (#PCDATA)>
The DTD is specified at the top of the document or in a separate file referenced at the top of the document. In both cases use standalone="no".

Question: What are the benefits of using a DTD?















Question: What are the benefits of not using a DTD?










ID and IDREF(S) Attributes

Element pointers: assign a special ID attribute to an element, then point to that element with a special IDREF or IDREFS attribute in another element.

Example: reorganized bookstore

   <?xml version="1.0" standalone="no"?>
   <!DOCTYPE Bookstore SYSTEM "bookstore.dtd">
   <Bookstore>
      <Book ISBN="ISBN-0-13-035300-0" Price="$65" Edition="2nd" Authors="JU JW">
         <Title>A First Course in Database Systems</Title>
      </Book>
      <Book ISBN="ISBN-0-13-031995-3" Price="$75" Authors="HG JU JW">
         <Title>Database Systems: The Complete Book</Title>
         <Remark>
            Amazon.com says: Buy this book bundled with
            <BookRef book="ISBN-0-13-035300-0" />,
            It's a great deal!
         </Remark>
      </Book>
      <Author Ident="HG">
         <First_Name>Hector</First_Name>
         <Last_Name>Garcia-Molina</Last_Name>
      </Author>
      <Author Ident="JU">
         <First_Name>Jeffrey</First_Name>
         <Last_Name>Ullman</Last_Name>
      </Author>
      <Author Ident="JW">
         <First_Name>Jennifer</First_Name>
         <Last_Name>Widom</Last_Name>
      </Author>
   </Bookstore>
DTD for this data:
   <!ELEMENT Bookstore (Book*, Author*)>
   <!ELEMENT Book (Title, Remark?)>
   <!ATTLIST Book ISBN ID #REQUIRED
                Price CDATA #REQUIRED
                Edition CDATA #IMPLIED
                Authors IDREFS #REQUIRED>
   <!ELEMENT Title (#PCDATA)>
   <!ELEMENT Remark (#PCDATA | BookRef)*>
   <!ELEMENT BookRef EMPTY>
   <!ATTLIST BookRef book IDREF #REQUIRED>
   <!ELEMENT Author (First_Name, Last_Name)>
   <!ATTLIST Author Ident ID #REQUIRED>
   <!ELEMENT First_Name (#PCDATA)>
   <!ELEMENT Last_Name (#PCDATA)>

Digression on (#P)CDATA

The details are messy (recall comments about clunky XML and official specs above), but overall:

XML Coverage


Relational Model versus XML

(First three are same as earlier)

  1. Relations: Schema must be fixed in advance
    XML: Does not require predefined, fixed schema

  2. Relations: Rigid flat table structure
    XML: Flexible hierarchical structure; graphs

  3. Relations: Easy to understand, simple query language
    XML: Can be harder to understand, more complex query language

  4. Relations: Ordering of data not relevant (tuple ordering or attribute ordering)
    XML: Ordering forced by document format, may or may not be relevant

  5. Relations: Transmission and sharing can be problematic
    XML: Designed for easy representation and exchange

  6. Relations: "Native" data model for all current serious commercial DBMSs
    XML: "Add-on," often implemented on top of relations