CS145 Lecture Notes (2) -- Data: Relations and XML



Terminology reminder

Relational Model versus XML: Fundamental Differences

  1. Relations: Schema must be fixed in advance
    XML: Does not require predefined, fixed schema

  2. Relations: Rigid flat table structure
    XML: Flexible hierarchical structure (also graphs)

  3. Relations: Easy to understand, simple query languages
    XML: Can be harder to understand, more complex query languages












Example Applications

  1. University records: students, courses, grades, etc.
    Relations or XML? Why?
    
    
    
    
  2. University Web site: news, academics, admissions, events, research, etc.
    Relations or XML? Why?
    
    
    
    
  3. Family tree.
    Relations or XML? Why?
    
    
    
    
    
Application domains we use for example schemas and instances:

The Relational Model

Database = set of relations (or tables), each with a distinct name

(Example: Student, Campus)








Each relation has a set of attributes (or columns), with a distinct name within its relation.

(Example: label Student, Campus)







Each tuple (or row) in a relation has a value for each attribute.

(Example: some tuples for Student, Campus relations)







Each attribute has a type (or domain).
  ID:char(9), name:char(25), GPA:float, age:integer
We'll use atomic (indivisible) types only for now.


Schema = complete description of structure of relations in database: relation names, attribute names, types, etc.
Instance = actual contents (tuples) of relations

Specifying the schema (on paper):

Naming conflicts:
    Student(ID, name, GPA, age)
    Campus(name, enrollment)      <- duplicate use of "name" okay
    Apply(ID, Campus)             <- "ID" okay, "Campus" not okay
Null values: (Unknown age example)









Keys: A key for a relation is a set of attributes such that no two tuples can have the same values for all of their key attributes. (Example: keys for Student, Campus)

Relations in SQL



XML

Like HTML, basic XML consists of three things:
  1. Tagged elements, which may be nested within one another
  2. Attributes on elements
  3. Text

In HTML, tags denote formatting: <Title>, <I>, <Table>, etc.

In XML, tags denote meaning of data: <Student>, <Book_Title>, etc.

XML data can be formatted using CSS (Cascading Style Sheets) or XSL (Extensible Stylesheet Language) to translate XML to HTML.









Well-formed XML

A well-formed XML document is any XML document that follows the basic rules: single root element, matched tags, unique attribute names, etc.







Valid XML

It's possible to constrain the structure of XML data, using either a Document Type Descriptor (DTD), or an XML Schema Definition (XSD).















[[On-Line XML Data and Validation]]



Question: What are the benefits of using a DTD/XSD?















Question: What are the benefits of not using a DTD/XSD?












Notes on (#P)CDATA in DTDs

The details are messy (recall comment about official spec), but overall:

XML Coverage



Relational Model versus XML

(First three are same as earlier)

  1. Relations: Schema must be fixed in advance
    XML: Does not require predefined, fixed schema

  2. Relations: Rigid flat table structure
    XML: Flexible hierarchical structure; graphs

  3. Relations: Easy to understand, simple query languages
    XML: Can be harder to understand, more complex query languages

  4. Relations: Ordering of data not relevant (tuple ordering or attribute ordering)
    XML: Ordering forced by document or stream format, may or may not be relevant

  5. Relations: Transmission and sharing can be problematic
    XML: Designed for easy representation and exchange

  6. Relations: "Native" data model for all current serious commercial DBMSs
    XML: "Add-on," often implemented on top of relations