========================================================================= LECTURE NOTES - XML ========================================================================= If you are interested in reading about the XML standard, please visit: http://www.w3.org/XML/ It will keep you busy for days. A very short web page linked to that site ("XML in 10 points") provides a light summary. XML = Extensible Markup Language -> A (relatively) new standard for data representation and exchange on the internet -> A document format: a superset of HTML, a subset of SGML (roughly) -> XML is to data what Java is to programming Like SGML and HTML, basic XML consists of three things: (1) Tagged elements, which may be nested within one another (2) Attributes on elements (3) Text In HTML, tags denote formatting: , <I>, <TABLE>, etc. In XML, tags denote meaning of data: <STUDENT>, <BOOKTITLE>, etc. (To format XML data, use XSL - the Extensible Stylesheet Language - to translate XML to HTML.) Well-formed XML =============== A well-formed XML document is any XML document that follows the basic rules: matched tags, unique attribute names, etc. Ex: bookstore data <?XML VERSION="1.0" STANDALONE="yes"?> <BOOKSTORE> <BOOK ISBN="0-13-861337-0" Price="$50"> <TITLE>A First Course in Database Systems Jeffrey Ullman Jennifer Widom Database System Implementation Hector Garcia-Molina Jeffrey Ullman Jennifer Widom Buy this book bundled with "A First Course", it's a great deal! A well-formed XML document can contain regular data (as above) or very irregular data. Valid XML ========= It is possible to define a "schema" for XML data, called a Document Type Descriptor (DTD). A DTD is a grammar that describes the legal nesting of tags and attributes. ]> The DTD is specified at the top of the document or in a separate file referenced at the top of the document. In both cases use STANDALONE="no". Q: What are the benefits of using DTDs? ID and IDREF(S) Attributes ========================== Element pointers: assign a special "ID" attribute to an element, then point to that element with a special "IDREF(S)" attribute in another element. Ex: reorganized bookstore A First Course in Database Systems Database System Implementation Buy this book bundled with It's a great deal! Hector Garcia-Molina Jeffrey Ullman Jennifer Widom DTD for this data: ]> Querying XML ============ -> XML turns the Web into one big database Several languages have been proposed for querying XML data. - We developed one at Stanford called "Lorel", based on OQL. - There is a recently proposed standard called XQuery. - There is also a simpler standard (part of XQuery, XSL, and XPointer) called XPath. - All languages are based on navigating through the structure of the XML document. Ex: Find the titles of books costing < $60 where Ullman is an author (based on the first XML data) In Lorel: SELECT b.TITLE FROM BOOKSTORE.BOOK b WHERE b.@PRICE < $60 AND b.AUTHORS.AUTHOR.LASTNAME = "Ullman" In XPath: BOOKSTORE/BOOK[@PRICE<60, AUTHORS/AUTHOR/LASTNAME="Ullman"]/TITLE In XQuery: FOR $b IN BOOKSTORE/BOOK WHERE $b/@PRICE < 60 AND $B/AUTHORS/AUTHOR/LASTNAME="Ullman" RETURN $b/TITLE XML query languages often include "wildcards" and regular expression operators for cases when exact structure of data may be unknown. EX: Find all titles anywhere in the bookstore containing "XML". In Lorel: SELECT t FROM BOOKSTORE.#.TITLE t WHERE t LIKE "%XML%" Million (billion actually) dollar question ========================================== * Will people store their data in XML, or only use XML as a transport format for data stored in conventional database systems?