CS99I Meeting 04 Notes: HTML and Representations

Started by Gio Wiederhold, 16 January 2000, updated  Jan 23 2002..

Representation

NUMBERS

Why Binary numbers?

What is the benefit of Binary Number representation. reliability through simplicity.

Counting with only two symbols {0, 1} .
What about 10 {0, .. , 9}?

What about 3 {-1,0,+1}?

Other base systems: 10, 3,

...
Character sets: ASCII 7 bits = 128 characters - 33 for control. (32 + null)  Derived from teletypes. Leaves 95 printable characters  

In practice 8 bits = 256 choices, ASCII plus whatever someone wants.

For more extensive languages there is Uniciode 16 bits  64K choices (1K is 1024 -- why)

Text

Build characters to words or numbers; words or numbers to records or sentences; records or sentences to messages; messages to papers or books; papers or books to knowledge?

 

Figures: 

Bit Image: Encodings of 2D graphics as  height x width pixels -- each pixel has 3 x intensity of color (RGB) or (Luminosity, .. )  A variety of standards, GIF, BMP, JPEG  (why)

Vector:  collection of lines defined by endpoints (x,y) for 2D, (x,y,z) for 3D.

For viewing 3D representation has to be converted to 2D, b y software or hardware [SGI -- Jim Clark's first company]

What about 3D bit images?

What about movies?

 

Markup  (Back to text)

Specify layout, type of print, bold, italic, size, headers, paragraph boundaries, tables, etc.

with otherwise invisible Commands as <B>boldface stuff</B>.

 

HTML

Hyper (multi-linked) Text (documents) Markup (with format annotations) Language, Used to markup documents so they can be easily shown on a variety of computer devices, and reference ( HREF ) local and remote documents and images. Remote documents require a computer address (http://www.somewhere.xxx ) so they can be found.

Document Formats

Paper: arbitrarily structured/unstructured; physical order.
Books: somewhat structured/unstructured; layout order; metadata: ToC, index.
Tables: very structured. Exceptions awkward -- footnotes
Databases: very structured. Machine processable, queryable. Exceptions awkward.

relational: tabular based, links by references, join operator; unordered. student|><|course-info
object-oriented: tree-based, structural (and optional reference) links; ordered (often)

SGML: for document printing, hierarchically structured; ordered
HTML: for document transmittal, varied presentation, hierarchically structured + links; ordered

Components

Three older inventions combined:

  1. Document Markup for typesetting: SGML [IBM -- Air Force about 1975]. Markups are metadata for presentation  ( HTML intro).
  2. Hypertext linkages to create a hierarchical document [Nelson, about 1960]. Uses Hyperlinks: http://computer/directory/file+/entrypoint$ (see Regular expression syntax)
  3. Simplified FTP, with embedded site address (http://cs.stanford.edu/account/...) avoiding having to login [BernersLee@CERN], uses Internet-based addressing for remote documents

Two Technologies:

  1. Ability to access documents remotely FTP extension: Hypertext transfer (Http) -- an FTP that responds to markup entries.
  2. A browser [Mosaic by [Andreesen, Bina the Univ.of Illinois HPPC center. A browser program interprets HTML, with http, and integrates text, images, and remote references (hyperlinks)

and a business requisite

A community of high-energy physicists who

  1. benefited from rapid access to complex documents [ at CERN]and
  2. had the computers on which the (free) browsers could be installed.  [Mark Andressen & Erik Bina at Univ,of Illinois Champaign Urbana center]

Browser competition [Clark-Netscape] [Gates-Microsoft]

Learn by reading and doing

Reading: Bring in a simple HTML web document (like this one), and see what it looks like

If you look at a `commercial' web page you will find many markups that we won't have to care about. Make notes about the ones that puzzle you and discuss them in class. The essential ones are listed in our CS99I HTML notes.
Doing, indirectly: Create a document with, say, Microsoft Word, save it as HTML, and look at it.
Doing, directly: Create a document with HTML markups yourself, as shown in the notes, and then save it as text. May be easiest to use a dumb editor, as Wordpad, Notepad on PCs or vi, Emacs on UnIX.

Change (rename) the postfix from .txt to .html, and then look at what you have created.

Role of HTML in e-commerce?

Advantages and Limits

Reliability
Readability
Processability
Granularity
-- (structure: word, line, paragraph, chapter, book )
-- (object: value, name-value pair, item, person, group, community ) with alternatives (family vs dorm)


See also the references.