CS145 Lecture Notes (5) -- XML Queries: XPath and XQuery

Query languages for XML are very new compared to SQL. XPath and XQuery were developed over the last few years and are still in flux to some extent.
XPath and XQuery are not based on a simple and clear underlying algebra, the way SQL is based on relational algebra.
Sequence of development (roughly):
1. XPath - path expressions and conditions
2. XSLT - XPath plus transformations and output formatting
3. XQuery - XPath + full query language
XPath is also used in XLink and XPointer.

=> In this class we will learn about XPath and XQuery, covering the most important features but ignoring some esoteric ones.

XML DTD and sample data for examples

   <!ELEMENT Bookstore (Book | Magazine)*>
   <!ELEMENT Book (Title, Authors, Remark?)>
   <!ATTLIST Book ISBN CDATA #REQUIRED
             Price CDATA #REQUIRED
             Edition CDATA #IMPLIED>
   <!ELEMENT Magazine (Title)>
   <!ATTLIST Magazine Month CDATA #REQUIRED Year CDATA #REQUIRED> 
   <!ELEMENT Title (#PCDATA)>
   <!ELEMENT Authors (Author+)>
   <!ELEMENT Remark (#PCDATA)>
   <!ELEMENT Author (First_Name, Last_Name)>
   <!ELEMENT First_Name (#PCDATA)>
   <!ELEMENT Last_Name (#PCDATA)>

   <?xml version="1.0" standalone="no"?>
   <!DOCTYPE Bookstore SYSTEM "bookstore.dtd">
   <Bookstore>
      <Book ISBN="ISBN-0-13-035300-0" Price="65" Edition="2nd">
         <Title>A First Course in Database Systems</Title>
         <Authors>
            <Author>
               <First_Name>Jeffrey</First_Name>
               <Last_Name>Ullman</Last_Name>
            </Author>
            <Author>
               <First_Name>Jennifer</First_Name>
               <Last_Name>Widom</Last_Name>
            </Author>
         </Authors>
      </Book>
      <Book ISBN="ISBN-0-13-031995-3" Price="75">
         <Title>Database Systems: The Complete Book</Title>
         <Authors>
            <Author>
               <First_Name>Hector</First_Name>
               <Last_Name>Garcia-Molina</Last_Name>
            </Author>
            <Author>
               <First_Name>Jeffrey</First_Name>
               <Last_Name>Ullman</Last_Name>
            </Author>
            <Author>
               <First_Name>Jennifer</First_Name>
               <Last_Name>Widom</Last_Name>
            </Author>
         </Authors>
         <Remark>
         Amazon.com says: Buy this book bundled with "A First Course,"
         it's a great deal!
         </Remark>
      </Book>
   </Bookstore>

XPath

Think of XML as a tree (or directory) structure.

XPath specifies path expressions that match XML data by navigating down (and occasionally up or across) the tree and possibly evaluating conditions over data in the tree.

Some basic constructs (very incomplete list):

/ root element, or separator between steps in path
X matches element X
* matches any element
@X matches attribute X of the current element ("context node")
// matches all descendants of the current element, including self
[C] evaluates condition C on the current element
[N] picks the Nth matching element
Path1 | Path2 union of Path1 and Path2 results
contains(s1,s2) returns TRUE if string s1 contains string s2
name() returns tag of the current element
parent:: matches the parent of the current element, if there is one
following-sibling:: matches all later siblings of the current element
descendants:: matches all descendants of the current element
self:: matches the current element

Important:

We cover a subset of the language through a series of examples.
The covered subset is sufficient to write lots of queries and understand the flavor of the language.
See the required XPath tutorial reading (and optional readings) for many more details.
Note: The XPath tutorial comes with a nice "XLab" feature for trying out queries on simple data.

Result of XPath Queries

Technically, XPath queries return a set of zero or more "matched nodes" in the XML data being queried.
Often there is an obvious way to express the result in XML itself, but not always.

XPath Examples

(Example: all book titles)

(Example: all book or magazine titles)

(Example: all ISBN numbers)

(Example: all books costing < 70)

(Example: all ISBN numbers of books costing < 70)

(Example: all books containing a remark)

(Example: all titles of books costing < 70 where "Ullman" is an author)

(Example: same query using //)

(Example: all second authors anywhere)

(Example: all author last names anywhere)

(Example: all books whose title contains one of its author's last names)

(Example: all magazines where there is a book of the same title)

(Example: all elements whose parent tag is not "Book")

(Example: all books where there is a different book of the same title)

For next example modify DTD to contain Remark* instead of Remark?

(Example: all books where all Remarks include "great")

XQuery

XQuery is an expression language, so its constructs can be freely mixed and matched.

A primary construct is the FLWOR expression:

  FOR     Specify documents, set up iterator variables
  LET     Set up other variables, usually for aggregation or common subqueries
  WHERE   Filtering condition
  ORDER   Sort results
  RETURN  What to return

XPath expressions are another important construct, and they can be used within all clauses of FLWOR expressions.

Important:

Like for XPath, we cover a subset of the language through a series of examples.
The covered subset is sufficient to write lots of queries, but it is a smaller subset than for XPath. Full XQuery is a large and complex language.
See the required XQuery tutorial reading (and optional readings) for many more details.
There is no lab feature in the tutorial for trying out queries, but you will use the Saxon XQuery interpreter in your assignment.

Queries and Results as XML

Suppose we want:

queries to be expressed as well-formed XML
queries to return results as well-formed XML

To do so, all queries are wrapped in:

<Result> { ... the
query here ... } </Result>

XQuery Examples

(Example: all titles of books costing < 70 where "Ullman" is an author)

(Example: all author Last_Name's of books related to databases)

"FOR $v in Expr" says: Bind $v to the things returned by Expr one at a time, evaluating the rest of the query for each one.
"LET $v := Expr" says: Bind $v to the list of all things returned by Expr and keep evaluating the rest of the query.
Can have any number of FOR and LET clauses in any order.

(Example: average price of all database books)

(Example: all database books priced above average over all books)

(Example: titles and prices of all books, sorted by price)

(Example: all book titles where all remarks include "great")

The EVERY-IN-SATISFIES construct adds universal quantification.
There's also a SOME-IN-SATISFIES construct for existential quantification, but it's used less frequently since many constructs are implicitly existentially quantified.

(Example: all book pairs with at least one author last name in common)

(Example: all title-author pairs)

(Example: all book titles for each author)

`/`	root element, or separator between steps in path
`X`	matches element X
`*`	matches any element
`@X`	matches attribute X of the current element ("context node")
`//`	matches all descendants of the current element, including self
`[C]`	evaluates condition C on the current element
`[N]`	picks the Nth matching element
`Path1 \| Path2`	union of `Path1` and `Path2` results
`contains(s1,s2)`	returns `TRUE` if string `s1` contains string `s2`
`name()`	returns tag of the current element
`parent::`	matches the parent of the current element, if there is one
`following-sibling::`	matches all later siblings of the current element
`descendants::`	matches all descendants of the current element
`self::`	matches the current element