We describe only a few basic commands of the eXtended Markup Language (XML). The current common version is XML 1.0. XML formatted data are intended for processing, not just browsing. Browsing of XML is currently supported mainly by Internet Explorer 5.0.
XML is an application conforming to a W3 group standard. XML
uses embedded tags to indicate semantic structures, while leaving the
interpretation of the tags to the client's application programs. The
tags directives are bracketed by Less-Than (<) and Greater-Than
(>) symbols. To enable this note to be printed in an HTML format we
entered these symbols internally as (&
The characterset for XML is Unicode, a standard covering the characters used for HTML, but also other alphabets. In fundamental form they start with an ampersand (&), are denoted by an integer, and end with a semicolon(;). However, the ASCII character code we use is an accepted default subset.
Each document should start with a process instruction
<?xml version 1.0?>,
here indicating that the document conforms to XML version 1.0, followed by
the tagged root-name of the document.
A simple document showing an XML document with 4 early
Hitchcock movies
(requires Microsoft Internet Explorer 4.+) uses tags suitable
for movies, its body is hence called <movies>.
All commands have a corresponding closure, for instance there should
be a and tag </movies> at the end of the document.
If the contents is empty, an abbreviation is allowed combining
both tags: <no-content/>.
The only content permitted in XML are character strings (called CDATA). No quotes are needed for CDATA. An example of a large document, publicly available, is Shakespeare's Taming of the Shrew. Internet Explorer will simply show it with all the markups, since we have not defined a conversion for presentation for it. But you can see how how the meangful tags now allow searching for items as the characters (<PERSONA>) in the play. All the tags used are listed in the accompaning Data Description.
It is important to document what elements can appear in an XML document, and how they are to be arranged. That Notation is called a Data Type Definition (DTD). We show a sample DTD used for plays of Shakespeare. It uses some of symbols (*,+,?) encountered when presenting regular expressions in the notes. Notice that the tags are now related to the subject matter, and semantically meaningful for people who understand plays. However, <FM> and <P+> are mysteries until one looks at the contents; they denote 1 or more lines of comments entered by the people who created the XML version of the play.
Since XML does not specify how documents are to be presented, that task is left up to the applications that read XML files. Without any program, our browsers (currently Microsoft IE explorer 4.0 or better 5.0) simply show the XML source document, converted manually to an html representation (a list of 4 Hitchcock movies) as a simple list, with all the tags explicitly shown. ( See list of Hitchcock movies, in XML form).
To obtain well-formatted visible output XML data must be
converted to HTML, but should be done automatically. Such a
conversion could be done for any specific XML file by any program that
understands the tags, and accordingly creates a suitable HTML file.
Such a program can be written in JAVA, and that allows its execution
on the client computer (see Notes
for meeting 3). Such a JAVA applet is available in the Microsoft
Internet Explorer as XMLDSO (XML Data Source Objects). You can see a
simple example of it's use on the
4 Hitchcock and on the
full list can be seen. Such a longish list gets to be awkward.
For more detail, see
Microsoft Data Binding Documentation (Jan.2000).
To create prettier outputs, we have to generate fancier HTML output. For instance, to split the long table for all of Hitchcock's films by category we defined a table in the program. We can also rearrange fields.
Long tables take long to load, and are hard to manipulate.
We can limit the size if the table to be shown with a
<TABLE DATAPAGESIZE=8 ID=table WIDTH=100% datasrc=#xmldso>
specification. To allow manipulation of that table we add a provision
for
<INPUT TYPE="button" VALUE="Next" ONCLICK="table.nextPage();">
of a button click, which refers to that table's ID. Now we can look at
Hitchcock's films page by page. This example shows all Hitchcock
movies, but not the directors heading (file Hitch0.xml).
Testing to include the directors heading
HTML 2-level table constructed by hand;
(file Hitch.xml) for Hitchcock's films page by page.
This formatting is created by the XML client application, here the HTML program we have written. An alternate choice is to provide formatting by a server, a style sheet can be specified.
With a suitable style sheet fancy formatting can be generated. Style sheets were common in SGML, so that a publisher could determine the printing layout of a book. For XML file W3 has proposed a general meta language, the XML Stylesheet Language (XSL).
All tags that match the
XSL specification, are then not shown.
Examples show a formatted table with
four
Hitchcock movies as well as
all
Hitchcock movies, formatted in multiple tables, using an
XSL style sheet
source; The XSL
source
has been converted to HTML for readability under any browser.
An XSL interpreter in included in Microsoft IE version 5.0. If no XSL is indicated, it will use a default style sheet, that is book-oriented. It recognizes the following tags:
In a stylesheet layout styles, relative sizes, and colors can be indicated
The ability to go to other documents exists also in XML, as in HTML,
but differs in format.
Contents of: Charles F. Goldfarb and Oaul Prescod: The XML Handbook,
3rd edition Prentice Hall, 2001.
XSL information
[W3C98]W3C: Extensible markup language. 1998.
[McGrath98] Sean McGrath: XML by Example: Building E-Commerce Applications; Prentice-Hall, Charles F. Goldfarb Series on Open Information Management, 1998.
See also the CS99I references.
For limitations see Madnick paper.