CS145 Lecture Notes (5) -- XML Programming: XPath, SAX, DOM




































XML DTD and sample data for examples

   <!ELEMENT Bookstore (Book | Magazine)*>
   <!ELEMENT Book (Title, Authors, Remark?)>
   <!ATTLIST Book ISBN CDATA #REQUIRED
             Price CDATA #REQUIRED
             Edition CDATA #IMPLIED>
   <!ELEMENT Magazine (Title)>
   <!ATTLIST Magazine Month CDATA #REQUIRED Year CDATA #REQUIRED> 
   <!ELEMENT Title (#PCDATA)>
   <!ELEMENT Authors (Author+)>
   <!ELEMENT Remark (#PCDATA)>
   <!ELEMENT Author (First_Name, Last_Name)>
   <!ELEMENT First_Name (#PCDATA)>
   <!ELEMENT Last_Name (#PCDATA)>

   <?xml version="1.0" standalone="no"?>
   <!DOCTYPE Bookstore SYSTEM "bookstore.dtd">
   <Bookstore>
      <Book ISBN="ISBN-0-13-035300-0" Price="$65" Edition="2nd">
         <Title>A First Course in Database Systems</Title>
         <Authors>
            <Author>
               <First_Name>Jeffrey</First_Name>
               <Last_Name>Ullman</Last_Name>
            </Author>
            <Author>
               <First_Name>Jennifer</First_Name>
               <Last_Name>Widom</Last_Name>
            </Author>
         </Authors>
      </Book>
      <Book ISBN="ISBN-0-13-031995-3" Price="$75">
         <Title>Database Systems: The Complete Book</Title>
         <Authors>
            <Author>
               <First_Name>Hector</First_Name>
               <Last_Name>Garcia-Molina</Last_Name>
            </Author>
            <Author>
               <First_Name>Jeffrey</First_Name>
               <Last_Name>Ullman</Last_Name>
            </Author>
            <Author>
               <First_Name>Jennifer</First_Name>
               <Last_Name>Widom</Last_Name>
            </Author>
         </Authors>
         <Remark>
         Amazon.com says: Buy this book bundled with "A First Course,"
         it's a great deal!
         </Remark>
      </Book>
   </Bookstore>






XPath

Think of XML as a tree (or directory) structure.

















XPath specifies path expressions that match XML data by navigating down (and occasionally up or across) the tree.

Basic constructs (very incomplete list):

/ root element, or separator between steps in path
* matches any one element name
@X matches attribute X of the current element
// matches any descendant of the current element
[C] evaluates condition C on the current element
[N] picks the Nth matching element
contains(s1,s2) returns TRUE if string s1 contains string s2
name() returns tag of the current element
parent:: matches the parent of the current element
following-sibling:: matches all siblings after the current node
descendants:: matches any descendant of the current element
self:: matches the current element


Important:


(Example: all book titles)




(Example: all book or magazine titles)




(Example: all ISBN numbers)




(Example: all books costing < $70)




(Example: all ISBN numbers of books costing < $70)




(Example: all books containing a remark)




(Example: all titles of books costing < $70 where "Ullman" is an author)







(Example: same query using //)




(Example: all second authors anywhere)




(Example: all author last names anywhere)




(Example: all books whose title contains one of its author's last names)




(Example: all magazines where there is a book of the same title)




(Example: all books where there is a different book of the same title)




(Example: all elements whose parent tag is not "Book")







For next example modify DTD to contain Remark* instead of Remark?

(Example: all books where a Remark includes "great")







(Example: all books where all Remarks include "great")







SAX and DOM

(Example: count all words in an XML document)

...
Document d = parser.getDocument();
int numWords = countWordsInNode(d);
...


  public static int countWordsInNode(Node node) {
    
    int numWords = 0;
    
    if (node.hasChildNodes()) {
      NodeList children = node.getChildNodes();
      for (int i = 0; i < children.getLength(); i++) {
        numWords += countWordsInNode(children.item(i));
      } 
    }  

    int type = node.getNodeType();
    if (type == Node.TEXT_NODE) {
      String s = node.getNodeValue();
      numWords += countWordsInString(s);
    }
    
    return numWords;  
    
  }

(Pseuedocode Example: get all ISBNs)