CS99I Meeting
2 Notes: HTML, XML, XSL
By Gio Wiederhold,
12 Jan 2000.
Topics Covered briefly
Representations
Advantages and Limits
Readability
Processability
Granularity
-- (structure: word, line, paragraph, chapter, book )
-- (object: value, name-value pair, item, person, group, community ) with alternatives (family vs dorm)
Convenience versus precision
Words: unique in context, ambiguous out of context
Context: explicit versus implicit (by sets of words:
- "miter .. bishop";
- "miter .. wood";
- "knave -- bishop")
processability -- web usage - retrieval.
Formats
Paper: arbitrarily structured/unstructured; physical order.
Books: somewhat structured/unstructured; layout order; metadata: ToC, index.
Tables: very structured. Exceptions awkward -- footnotes
Databases: very structured. Machine processable, queryable. Exceptions awkward.
relational: tabular based, links by references, join operator; unordered. student|><|course-info
object-oriented: tree-based, structural (and optional reference) links; ordered (often)
SGML: for document printing, hierarchically structured; ordered
HTML: for document transmittal, varied presentation, hierarchically structured + links; ordered
- Hyperlinks: http://computer/directory/file+/entrypoint$ (see Regular expression syntax)
- metadata for presentation
( HTML intro).
XML: for document processing, hierarchically structured + links, more; ordered (except for attributes)
- tags with `semantic' names (<person> person stuff </person>)
Hyperlinks: http://computer/directory/file+/entrypoint$ (see Regular expression syntax)
- optional metadata for description (DTD) and/or presentation (XSL)
( XML intro (coming)).
Important for formulating
- Representation grammars
- queries (getting some subset of the representation)
sequence: (a,b,c)
alternatives: (x|y), in combination (x|y, b,c) {x,b,c or y, b, c}
optional: q$ {q | nothing}
any: r* {nothing | r | rr | rrr | rrr... }
repeats: s+ { s | ss | sss | sss... }
Example:
(((S|s)ection|paragraph(s$) )*.)
matches all citations looking like
Section xx., section xx., paragraph xx., paragraphs xx.
By setting a marker for xx, those text can be retrieved for display ot processing.
A regular language is capable, but not really user-friendly.
- Would such a query language help your browsing?
- Would such a language help in screen-scraping?
Programming
Base Programming: Machines as interpreters
Scripts: Software as interpreters
Combinations: microcode, Bytecode
CGI, Java, Javabeans, etc.
- The program - either in machine language or an intermediate language, as Java bytecode.
- The data - in one of the representations discussed above (.., databases, XML, ...)
Role of Standards
Standards are a tool in competition.
They can be set by a
- Governmental Agency: Prescriptive standards.
>LI>By historical convenience: width of two Roman horses --> width of carts and wheels -->grooves in stones --> other carts --> mining carts --> rails for mining carts --> Standard gauge railroads.
- By a company that dominates the industry -- IBM in the 1960's and 1970's (cards. tapes, disks, SQL), Microsoft in the 1990's (Windows, Word, ...).
- By an industry consortium that tries to counteract a dominant company -- POSIX for UNIX.
- By an industry alliance that tries create a market -- OMG for object-oriented software.
- By a dominant customer -- DoD for Ada language.
- By a customer - supplier collaboration -- Wintel: Microsoft and Intel.
News about java Standards
New York Times: January 26, 2000
Microsoft Is Told to Abide by Sun on Java
By THE ASSOCIATED PRESS
SEATTLE, Jan. 25 -- A federal judge has ruled that
Microsoft must conform to standards set by Sun Microsystems when it
sells products that use Sun's Java programming language.
The judge, Ronald Whyte of the United States District Court,
amended a preliminary injunction that said Microsoft would be in
violation of Sun's copyright on the Java language as well as in
violation of its contract with Sun if it shipped products that failed
to conform to Sun's standards.
An appeals court had overturned the earlier order because of
the copyright element. Judge Whyte dropped that part of the ruling in
his amended order.
The ruling came in a lawsuit that Sun filed in October 1997,
accusing Microsoft of trying to extend the Java language for special
use with its Windows operating system, which Sun contends is a
violation of both the contract and the Java copyright.
Michael Morris, general counsel for Sun, said the company
was happy with the decision by the judge
Jim Cullinan of Microsoft said the amended ruling showed
that Microsoft did not harm competition through its actions but merely
that a contract dispute existed over the companies' licensing
agreement.
Microsoft is still barred from shipping its versions of
Java, and other issues in the suit remain unresolved.
Java, introduced by Sun in 1995, allows developers to write
a software application that can run on a variety of computers,
regardless of the underlying system. Sun has tried to promote Java as
a universal programming language.
Notes
See
Brief
intro to HTML.
Brief
intro to XML.
0
Brief
intro to RDS ADO [ASP, 25Feb 2000].
XSL information
See also the references.