CS99I Freshman Seminar

Winter 1997/1998.

Traveling the Information Highways: World-Wide Browsing

Maps, Encounters, and Directions

Master copy on Earth.
Draft 14Dec1993, rev.10Mar94, updates 25May94, 15Jan98. Master on EARTH
This material is

©Gio Wiederhold and CS99I students, Stanford University, 1998.

Chapter: World-Wide Browsing

Cartoon: "The Guy Who Took a Wrong Turn Off the Electronic Superhighway and Wound up in A Microwave Oven in Davenport Iowa": Defrost Wingettes, 4:15. [New Yorker]
Previous chapter - Next chapter


When we get to a town we haven't visited before we might immediately get down to business and search out the factories, stores, libraries, or government offices we need. Alternatively we can take some walks and get comfortable with the environment. When wandering through the neighborhoods we might do some window shopping, browsw in the shopping malls, meet people in restaurants, and enjoy playgrounds and parks. This chapter focuses on this second alternative, while later chapters are more directed.

To get into remote towns we get on the highway with some idea of the destination, and turn onto an off-ramp when we get near to where we want to be. If we live in a large city, we may not need the highway, because many services will be local. We can meet friends, known ones or new ones by looking at bulletin boards (BBs). We can look at local guides and directories, advertisements, and newspapers to learn what is going on.

In the meeting places one can meet potential friends, but also crooks. While one will not be physically hurt when interacting solely on the network, one should be careful when giving out addresses, keys, credit card numbers, and the like. In Chap\F we will present the possibilities of making contacts for electronic commerce, and the need for security. Today's electronic highways are still in the middle ages, and one can encounter knights, bandits, Robin Hoods, good samaritans, peasants, and artisans trudging warily to markets, and many individuals just seeking adventure in other countries.

Off-ramps from the highways also provide access to fantasy worlds, with castles, labyrinths, etc., populated by mythical beasts, to slay or befriend. A visitor can assume a playful role in this fantasy world: a meek person can be a fearless hero, and anyone can issue sage advice. Playing games with others in such fantasy worlds is enabled by accessing Multi-User Dungeons (MUD). Players can deal with mythical beasts or other players, reprsented by avatars, who they have never met as real persons. Players preserve anonymity by giving themselves imaginative names (handles) as "Moonshadow" [ref Washington Post Legislate #1193184, 28Nov93].

At other Off-ramps one finds opportunities to meet people, to shop, and a wide variety of information. The table in this chapter lists some of the resources, but the scene varies so rapidly that it is best explored on-line; although many guides are published [Braun:94, Dern:94, Krol:92]. <> In [Gr\/onbaek:94] methods for effective browsing on the Internet are presented, even with guidance one can easily get lost, since the signage on the `Internet look[s] like the New Jersey Turnpike outside Newark' [Acronym:94].




Using the computer for

browsing is a relatively new activity. Until the internet was well established there were few places one could get to, and even fewer that could be accessed freely. There were libraries, but access to them was often limited to qualified experts. There were some directories, but those were intended for scientists, for instance to locate data gathered by NASA explorations [ref ].

But as travel became affordable, people strarted hawking their wares along the roads. And since goods are not consumed, only copied along the highways, many people set up stalls showing and sharing their wares, without expecting any reimbursement, other than some `thank you's, some recognition, and the hope to be able to change the world a bit. Many government institutions starting making their data available, often with similar motivation.



Some institutions have delivered information to remote users for a long time, for instance the National Library of Medicine (NLM) with its Medline service. The papers that are made available are carefully selected and indexed. Such library operations will be presented in Chap\L. The number of such value-added services is increasing, but the services discussed in this chapter focus on broad and free access, with little guarantee that the contents is accurate, complete, and unbiased. The reader must judge the value of what has been stored and retrieved. Knowing the source can help, for instance, one would not expect !example of an obviously biased BB.



There are many documents that people want to make available at a much more informal level. By providing anonymous ftp access to colleagues on the Internet, the formal library system can be bypassed, avoiding both delays and scientific scrutiny. Since such ftp-sites are widely dispersed, and may use somewhat different access conventions, a tool to braden access is helpful. The most popular tool today is `Archie'. An ftp-site can become an Archie-server, which makes an index of ftp-accessible documents and programs available to Archie clients. <>. A searcher for a document now has a much wider choice, and when likely documents are identified, can execute the proper ftp protocol to obtain them.



\VERONICA: veronica>.

Veronica[Un.Nevada]=Gopher index server, updated monthly, replicated



\PROSPERO Prospero>


\WAIS In 1989 Thinking

Machines Corporation, and in particular , were investigating broader uses of their Connection Machine (CM), a powerful parallel computer suitable for rapid scanning of large bodies of text. The CM computers had been effective in intelligence agencies, but that market is limited. Thinking Machines made their software, WAIS (Wide Area Information Server) and CMs at their home site freely available and so enabled many groups with data resources to experiment with provision of free data access over networks. Data can be accessed by anyone with a minimal terminal or PC, all the search effort is done in the CM. The search conventions and data formats established for WAIS led to a standard: X39.50. Today some of those experimenters have installed their own equipment, and can make data available themselves, following the same standards. Today a separate company, WAIS, Inc., provides WAIS .

but inadequate means of making them available,

WAIS needs 1. incremtenal delivery, 2. Measures `of relevance (now 1/0) , 3. then ranking


\WEB World-wide web. Tim

[Berners.Lee:] at Cern 1989 for high energy physicists hypertext

client server model. Objective was simply to reduce lead time for physics preprint

Define HTML, concets based on TeX, in use by physcis community.s.

Thinking machines

Figure \arpanet. The nodes and connections of the ARPA-net in 19.

Mosaic is a brwsing tool provided by the Supercomputer Center in Champaign Illinois, supported through NASA. Browses through World-wide web. Berners.Lee:] arrt Cern

Mouse diven. Image access. Sound. 100ds of DBs configuted with Mosaic.

Nasa`weather, Clinton's speeches @ Un.of Missouri, music vidoes @ MTV , Library of Congres catalogue, UC Berkeley paleontology

Novell for its documents. Next on-line magazines, supported by advertising [O'Reilly and assocaites, Sebastopol CA]

need direct Internet hookup

Mitch Kapoor, ex Lotus, head Electronic Frontier Foundation.

Today an enormous volume of information is available.




Browsing is an informal,

unaided search through information sources.

It distinguishes itself from formal querying (Chap.\L\F\Query?) by serving casual visitors wandering along the information highway.



browsing the searcher has no specific idea in what exists in the information bases, and little idea of where relevant information might be. Initial steps try to identify candidate resources, using the equivalent of the yellow pages to find candidate suppliers. In subsequent steps the material on the shelves of the candidate suppliers are scanned to look for interesting stuff. If there are many shelves, you may try to find the most likely shelves by their label, or you may consult local inventory lists.

Since the casual browser will not know the prcise designation of what is wanted, assistance is needed. A number of methods can be employed be helpful assistant.

\item{1} A menu can be provided. Since there is too much stuff to fit on one menu page, the menu will be hierarchically organized. Figure\friendly showed the top entry of such a menu. At each level of a hierarchy a choice among $7\pm2$ categories seems optimal for human perception [TMN<<>>]. Creating natural icons for all entries can be difficult. When there no natural hierarchy which can serve as a layer then initial letters or digits can be used, but user-friendliness is soon lost.

\item{2} Multiple menus are often needed. A single hierarchy imposes one organization principle. Even if it can be shown that one taxonomy is best, say arranging auomobiles by brand, type, and serial number, some searchers will prefer to search for the same cars by color, size, and age, and yet others by state, town, and license numbers. Zooming in a map may be the best way to pinpoint a location. Sometimes one may want to use characterstics from multiple hierarchies. There might have to be a menu of menus.

\item{3} Generalizing from examples. A browser may want to bring an example, perhaps by reference, and look for similar items. [CBR]. One may want a car that is similar to one

owned earlier, or find suspects that match a sketch or a video clip. To locate a piece of music one may hum a view bars, and to locate a house one may sketch its outline.

\enditem In all cases browsing is characterized by successive refinement and interaction. In Chap.\U\H we reviewed which types of computers are effective in supporting such interaction. <>









problem WWWeb requires modification of source documents, unaccepatble, confusion. [Engelbart]





systems that are becoming available for browsing use an increasing variety of media. While simple text still dominates, there are drawings, pictures, videoclips, film, sounds and voice. The only sensory output missing along the digital highways are smells and bumps. With the variety of information media come a variety of presentation and input modes. For graphics we need to display or enter lines and shadings for areas. For pictures we need TV-like displays and digital input of photographs, x-rays, etc. For video and film we need sequences of images, presented with precision, so that motion remains smooth. Sounds are represented by digitized waveforms, and spoken words must be played back precisely to be clear to the listener.

Technology is making rapid progress in managing data in all these media. The cost of transmission is high for some of them, as discussed in Chap\U\T\? . However,

not everyone along the digital highways has the same capability to receive or enter information in all those media, and there are situations where some media are not appropriate. For example, receiving driving directions in image form while driving is dangerous. In a noisy environment speech may not be audible and receiving data in voice form will create a distraction in a library or classroom.

An important category of requirements for media conversion is to provide fair access to disabled persons. Participation in activities along the digital highways can can make a crucial difference her. Access to the information highwas should empower, rather than hinder disabled people. Today the U.S. alone is spending \dol200Billion per year on services to disabled and elderly.persons. Bringing the digital highways into their homes is the first step. Assuring access for visually, speech, or motion impaired individuals is the next step.

App;ies asso to noisy emvironmements, or to situations as within a kibrary, where bleeps disturb the sdesitred silemce, or to be people working underwater.













\HYPERTEXT A hypertext is an active text, where the reader can touch any term in document and move to a section in the document where more information on that topic is provided. In practice only certain terms are touchable in hypertext systems, typically indicated by being displayed in bold-face. Invisible to the user are embedded cross-references, which indicate the position in the document of the referenced term.

Initially hypertext linkages may be created by the author. Authors who structure their writing in a top-down fashion, from layout to chapters to sections etc. create a natural linkage hierarchy, which is easily captured by these hyperlinks. Such an author may also be aware when the hierarchy breaks down and references across the hierarchical tree are needed.

Tools to convert existing texts to hypertext are available. They will seek out all terms and cross-link them, omitting words that are common ( stopwords, as considered when indexing in Chapter\L\T\INDEXING) or appear in every section. Terms that appear only once cannot be linked, of course. Some human assistance is typically required. It makes no sense to cross-link all citations, only links that provide useful explanatory material need to become hyperlinks. Just as in indexing, a problem for automatic creation of hypertext is that concepts are expressed by multiple terms, and the terms themselves are spelled inconsistently, so that it is easy to miss useful linkages. Having a good thesaurus will generate many more links, but the result will require yet more editing to remove irrelevant linkages.

Ongoing interactions with users is one way to maintain hypertext documents. Now a responsive maintainer is needed throughout a document's useful life-time. Such a maintainer, should be reimbursed, and that requires charging mechanisms, which have been an anathema to the community developing browsing.

Linkages among documents add further value, but should probably be limited to major topics, so that the browsing user is not induced to open an excessive number of mariginal documents, The technology for inter-document and inter-node browsing is also more complex, since the references will be much more indirect. A remote document is also subject to editing, requiring updating of cross-references to it. Since up-to-date documents are more valuable than static ones, their maintenance is of great value.

Currently, remote hyperlinks are rarely available. No standards exists for links and link interpretation. If standards were available, remote access would be readily enabled, since most suppliers of hypertext documents are committed to open systems. Having an open system does not imply that a hypertext service needs to be free, so that the ability to handle e-money remains an issue.



HTML MIME standards (no synchronized video)

Once the material has been received from a MOSAIC server it has to be displayed on your machine are needed to present it on your computer. Display software is available for most computers (Windows, Mackintosh, Unix Computers ), but you have to bring that software to your computer.

Work going on o make it extensible, work with SGML suppliers

Shared Mosaic (NCSAA Collage - whiteboard)

scrpt language to create ahypermedia tour.

Secure Mosaic.

Authoring tool for mOSAIC

Storyboard (EITsech, available on PCs also, without standards) emailable animation <--> <

succuss in Mosaic is due to the good viewers on X-windows, MAC, MS wndows




[H.Maurer, IICM, Graz, austria] compatible still with gopher

adds 3_D (using Silicon Graphics 3D icons) also real-world 3D models(digitizing the baroque library building in Vienna)

Anchors in arbitrary datatypes

Computer navgable links

Annotations of Different types

collection and guides tours overlaid over WWWeb net defned by public or private supplier

avoids physical copies

Attributes to constrain search, with intersection capability

Spin-oofs HyperM presentation system

HM-card personal Hypermedia system

PClibrary electronic library = collection of books with langenschedit and Brockhoasu Springer. Intially mainly dictionairies, ency, Duden, now handbookof machnen bau, ENT , ... meduical texts

select books for one's desktop, then allows searching for hyperterms. , then personal linkages can be made

Journal of Universal Computer Science (JUCS), annually in paper by Springer. [C.Calude, H.maurer, A.Salomaa] Submission by email, referreing via Hyper-G.,

Publication Hypper-G at multiple server sites (at many Univ. for local fast access, 50 committees), free 1995-1996 after 1997 $100/year per line per University

net access needed for detailed in figures (local default `postagecstamp figures', for printing acquire better quality PS copies.

CDROM, paper

Quality citable

150 leading scientist editors. { .. boman SRI,Stanford, ... , schllgeter Nievergelt (Zurich)


Protection trust universities, large companies, idvidula ccontrol has been lost. [Maurer]

Keep things affordable to discourage copying.

cross check with manual reference to see if you have the manual handy ask for a color code



Security firewalls, Kerboros

for Galaxy directory services see http://galaxy.einet.net

Active learning [H.Maurer] Record voivce image of prof, presenation, etc digitally with xx dartmouth, Peter Klauer [was Un.Zrich, now swiss bank]

Remote access, and grebn/red light to indicate speed up and slow down. Authoring on the fly.

AEIOU project for Austria's 1000 anniversary, to become publically available, with history, pictures, culture, ilm arciv, Musickgeschichte with sound Ausriaca demo films how one lives, ets, dies` in austria, 15000 world images` collected by maurer.

Ostereich lexicon, [ontlogical unificatiob with germany]


\active objects, tell

users, NII channel with 100 most active Universal Resource Locators (URLs)



[hpc meet] for design UCberkeley








hypertext model described above we have assumed that the author or a subsequent maintainer creates hyperlinks as an added value to the users. But some users are likely to require private links.

3D input data glove, data helmet -what are you looking at

Such linkages may be created implicitly, by tracing the users path while navigating through a document. Creating such a pathe also has the immediate byproduct of allowing backup, by having an undo option which reverses the travel, although retaining a record of the path taken.



\W\Bio Brewster Kahle

\W\Z Remote browsing is here today, and has opened the eyes and minds of many people to the benefits that can be gained by traveling the information highways. A secondary reaction is that too much is available along those roads, and the number of hawkers is increasing steadily. There will be a market for guides and advisors to help the traveler. If the traveler is in search of specific items, rather than idly perusing the wares, then there is also a rule for brokers or mediators, as introduced in Chapter\M.

As in any new enterprise, the market is quite inconsistent in form and content. As remote access becomes broadly available the problem of inconsistent terminology will become more troublesome. Those troubles will motivate efforts to become consistent, today the establishment of common ontologies is largely carried out in isolation.


Deal with thousands of servers.[MIT gifford: cntent labels on servers. Mediatr agents.]

http:///www-psrg.lcs.mit.edu/ for 500 wais servers. with query completion, probability based on headlines in contebt.






... ...

aurora@xi.uleth.ca /Canada t.holloway@warwick.ac.uk tr>
name / type sponsor topic access path charging [ref]| %source

AlterNex / BBoard / Brazil Ecology |
ARPA / Doc.svce HPCC documents free / http://ftp.arpa.mil | Aurora? / finger file S.T.D., Un. .../ Canada status of the Aurora Borealis

Chatback / email group IBM Great Britain / Warwick 01 223 0017 contacts for / speech-handicapped children free Telecom Gold 01:CLK001 /
Comlink / BBoard / Germany Ecology |
ConflictNet / BBoard Inst.for Global Comm. / San Francisco |
EcoNet / BBoard Inst.for Global Comm. / San Francisco Ecology Sprint |
EcuNex / BBoard / Ecuador Ecology |
Fedworld / file service federal documents @ | GILS / Locator Office of Management\Budget / (OMB) US Government Information multiple / proposed IITF / echriti@usgs.gov>|
GlasNet / BBoard / Russia Ecology |
GreenNet / BBoard / Great Britain Ecology |
LaborNet / BBoard Inst.for Global Comm. / San Francisco |
PeaceNet / BBoard Inst.for Global Comm. / San Francisco |
Pegasus / BBoard / Australia Ecology | |
Web / BBoard / Canada Ecology Wellington NZ museum digitizes 1.5M objects, 100K in 3D. |

< (inside body?) spece and body seems real, the familiar end

Free Internet communication makes sattelite communication unacceptable but needed to motivate introduction of new technologies.


Previous chapter - Next chapter CS99I home page.