CS99I Freshman Seminar

Winter 1997/1998.

Traveling the Information Highways: Mediators

Maps, Encounters, and Directions

Master copy on Birch.
Draft 11 Jan1994, updated 30Jan1994, 10Feb1997
This material is

©Gio Wiederhold and CS99I students, Stanford University, 1998.

Previous chapter - Next chapter

Chapter: Mediators

Between the minds that plan and the hands that build there must be a mediator [Brigitte Helm inMetropolis, a silent movie by Fritz Lang, 1926]


Several centuries ago, few people traveled. Most worked on a farm or pursued a trade at home, and occasionally walked to the local market. Soldiers marched long distances, mariners sailed, knights rode horses, and only some adventurous merchants and the crusaders used multiple means of transport, as camels, horses, and ships. Today, traveling long distances routinely requires a variety of conveyances, each suited for its particular domain, say buses, trains, planes, ships, barges, bicycles, cars, four-wheelers, pickups, trucks, and the like. Vehicles that have to operate in multiple domains tend to be less efficient in each than domain-specific vehicles, an example, amphibious carriers. Even similar vehicles become attached to specific domains, once a tanker truck has carried gasoline, it is longer a desirable means for transporting milk. Specialization may also be needed to deal with quantitative differences. An old car may well be adequate to go shopping, but using it may be too risky for delivering supplies to customers. Driving a personal car cross-country, but is rarely feasible in business, for long distances air travel dominates. To switch among transport modes, interchange hubsare established, where people can wait, obtain tickets, and where goods are repackaged, both to suit the means of transport and the consumer.

On the information highways, we are encountering diversity of roads and vehicles as well. The diversity becomes a greater concern as we reach out beyond our homes and workplaces. While once computers operated in stand-alone mode, or used simple connections to each other, the information highways we are contemplating will have many thousands of participating computers, and even more interchange points. While the notion of having similar computers, and similar data and information structures everywhere would make life simple, such a coherence is clearly not feasible, just as we could not have a single mode of transport nor a single package for all the goods to be shipped. Progress also requires change, and incremental changes also create inconsistencies. While Henry Ford might have been content if we all stayed with the Model T, over time most Model T's were replaced by faster, bigger, and more colorful vehicles.

The software equivalent of interchange hubs are called mediators. Mediation is an integrating concept, combining a number of current technologies to find and transform data, and making them available in hubs along the information highways. Mediation recognizes the autonomy and diversity of the data systems, information services, and user applications. Their autonomy enables the overall system to grow, since new sources, new means of transport, and novel information processes can be inserted. Incremental growth only requires that a few mediating hubs be adapted to link the new facilties into the traffic network. As the new facilities become more popular further mediators will adapt to take advantage of them and the business they represent. Those users that need the new sources will use the adapted mediators, users that don't care remain unaffected.

Just as hubs for transport have become specialized to deal with people, letters, goods, foodstuff as fruit, fish, beef, and the like, mediators will also be specialized. Specailization makes maintenance feasible, an expert can focus on one's own domain, without having to consider the different constraints imposed by handling fish versus bicycles.

At a hub one often fnds hotels for people and warehouses for goods. These are necessary because the the variety of transport mechanisms can not be perfectly synchronized. The equivalent function in the digital systems is intermediate storage or cachingof data. The technology for storage of data was discussed, but the function differs here. In a database a long-term, coherent collection of information must be stored and maintained. In a mediator only copies of information to be forwarded needs to be stored. Such data must be well enough identified with name, type, source, date-of-validity, so that it can be merged with related data. If data has to be kept a long time before a match can occur it may be wise just to keep a reference in the mediator, and acquire it from the source database when required. Sometimes much processing is needed to merge data, so that specialized processes may be associated with a mediator, just as transportation hubs become rapidly surrounded by specialized factories.

Mediation is achieved by software. A mediator transforms data available on the network, to make it more suitable and relevant to the consumer. This software function can be carried out on dedicated computers on the network, or can be assigned to computers performing other tasks as well. Since software is easy to copy over the same highways used to transport the goods many copies can be rapidly installed. The ease with which software based factories can move implies a much more flexible configuration of the network than seen in traditional manufacturing. A traditional supplier of added value can be rapidly replaced.

Mediation adds value to the data. Mediators are created by experts and need to be maintained, so that they can remain effective in a constantly changing world. The user of a mediator should pay for such services, and creator and maintainer receive payments. Growth if the information highways is enabled by mediation among autonomous modules, and growth of mediation requires the availability of payment mechanisms.


The concept of mediation encompasses a large number of techniques that dealt with segments of the problems encountered when developing or composing large information systems. We will describe their individual histories here. many of these techniques remain valid independently, and readers may have encountered them in specific contexts.


Designers of relational databases

Figure: Clinic Example: Two domain-specific mediators supporting one customer.



A mediating module carries out tasks to serve a user, and a desirable aspect of such a computer-based servant is that it understands what is needed, can be directed by the user, and, a prerequisite for direction, can explain to the user what its current understanding of the task to be performed is. These tasks sound like performing ever so much magic, and the technology which supports such tasks, Artificial Intelligence (AI) , is often treated as magic, or at least imbued with human-like qualities. However, when well organized and suitably limited, these tasks become quite manageable. For mediation we focus on a subset of AI, referred as Knowledge-base Management System (KBMS) .



Mediators perfrom functions to overcome the problems of disitermediation, the loss of critical analysis and summarization of available resources. <<>> Mediators perform a service, namely transforming datato information. In Electronic commerce having information makes the market more efficient. Bad comapnies will be identified and shunned, and good companies will get more business.<<>> Trying to get information from the web is like drinking from a firehose Information should be relevant, correct, and dense.
Figure: Overload: An overload of data leads to information starvation.

Hence Mediators must first of all reduce the volume being presented, by selecting relevant data from a variety of sources and summarizing it bring it the level needed for the users' application. When multiple data sources contribute data the data must be combined or fused. Reliable fusion requires that data match in terms of level and scope. To reduce the cost of repetitive access a mediator may store its internal results for some time in a cache. The sources may be updated at differing times, and having a cache also provides a means to synchronize data, so that they will be temporally consistent. However, dealing with consistent, but obsolete data will nor satisfy users involved in planning tasks, often requiring simulation. Simulation can extrapolate information so it becomes current, or extends into the future. generating new, but less definite information. These five functions will now be described in detail.

Figure: Mediator Tasks: generating information.


Triage of information [Rick-Hayes Roth]
Quality assessment of sources.


Even after selection there will be an excess of detail in some of the data. An important task of a mediator is to summarize the detail to a level that is meaningful for the user, or perhaps for further processing. For example, when trying to economize on the production of say, an automobile, it is necessary determine what the costs of the various subassemblies are. For each subassembly labor and parts costs have to be aggregated. The results of summarization can then be integrated



Since the selected information may be in multiple sources, there is now a need to integrate the results obtained by selection and summarization.


Many sources only provide current information. Distinct sources will differ in what the consider current. To reliably integrate information that differs in currency one must often cache past information, so that it can be correctly matched.
!May need simulation !


To make the information easy to process by the customer application it is best if it is transformed into object-oriented structures.


Mediation services must be maintained, and maintenance can only be sustained if experts are available. There hence has to be mechanism for supporting the mediators and the staff. In some cases there may be a professional organization, let's for our example the Society of Shoe-purchasing Agents. In general, there has to be support for such value-added services. Mechanisms were presented as part of electronic commerce.
Figure: Service: information and reimbursement.


Designing a mediator has two phases:
  1. Finding sources, understanding them, and providing access paths to them
  2. Bringing the sources into a common framework, so that the result is managable by customer or an application program <<>>

Figure: Design phases: selection and description, followed by integration into a domain-model.




!check duplication of function!
In order to process data and reduce it to information, there must be an understanding of the needs of the users, the resources provided by the underlying information systems, and the means to put them together. Such an understanding must be formalized into a modelmodel. Such a model consists of a hierarchical listing of concepts, and relationships among them, shown as triangle for !some application. The relationships can be seen as linkages among the nodes, while the nodes are typically represented by information that can be obtained from the web.

A mediator node will manage access and processing of a number of concepts within one domain. !Extract a hierarchy!.

Figure: Extract a tree: Finding and structuring data from the web.

!incorporate in Ontologies?!


To permit information from distinct sources to be shared, there must be agreement on the terminology in the shared area. It is not sufficient to match information on the basis of words, because words differ in meaning in different contexts. these problems were already encountered in earlier chapters. In this section we try to !lay out! the approach to !solve! the issue.

For each domain we define an ontology: a vocabulary, and a classification scheme which links the terms used, as shown in Fig.\vocabulary !somewhere!. When a term is used outside of its context, it is labeled: Carpentry.miter versus Religion.miter. Within a domain the terms should be wholly unambiguous, both in definition and scope. Considering the differences between employees in Personnel and on the Payroll from the example in Sect.!\?\?\?!, we need to distinguish those two domains. We show a simple example of two domains, and note that outside of the domain we should label the terms Shoesales.model, Shoesales.supplier, etc., and Shoemaker.company, Shoemaker.model, etc. .

Figure: Domains: terminologies in a shoe store, a shoe factory, and in purchasing shoes.

Keeping all terms disjoint disables the domain interoperation we seek. We have to develop a set of operations that permits us to match and merge ontologies: an algebra over ontologies. Only a few operations are needed, namely Intersection (Ç), Union (È), and Difference (-). We also need operations to help us within the domains, so that local domain terms can be mapped to meanings that are globally acceptable, we call this operation Map (M).

Intersection(Ç) creates a list of those terms that match in meaning. For instance, given the two domains, we can define the intersection to match Shoesales.model with Shoemaker.model, and Shoesales.supplier with Shoemaker.company. Note that these are knowledge-based operations, they require that somewhere the knowledge exists to permit the computation of the intersection of the source ontologies. This need for knowledge establishes the reason for having mediators as distinct modules, since this knowlegde has to be captured and maintained somewhere. It cannot be part of Shoesales, nor of Shoemaker alone. Within the mediator we create a new ontology for this intersection, let's call it Shoes. then Shoes.model = Ç(Shoesales.model, Shoemaker.model) and, creating a new term in the Shoes ontology Shoes.maker = Ç(Shoesales.supplier, Shoemaker.company). Terms as color and tint, and orders and backlog can also be matched. Not all terms will be defined in the intersection. Local terms, as Shoemaker.man-hours will not be matched and not appear in the intersection shown.

Union(È) creates a complete list. For those terms that have a defined intersection the result is given, the remainder are copied as they are. This operation does not create a consistent new ontology, since now similar prime terms continue to have their old domain association. For example AllShoes.model = È(Shoesales.model, Shoemaker.model) AllShoes.Shoesales.price = È{Shoesales.price} AllShoes.laborhours =È(Shoemaker.laborhours) In practice the ontogical algebra operates on sets, with results being new sets.

Difference (-) permits removal of terms. It's main use is to let us determine which terms are unmatched in sets of terms, i.e., local, as OnlyShoesales.

Map (M)permits computation within an ontology, so that more terms become candidates for matching and for the creation of useful intersections. For instance, there is a relationship between the Shoesales.price and the Shoesales.profit which requires the Shoemaker.price. We can define Shoesales.cost = M(Shoesales.price, Shoesales.profit). We also need mappings to determine the size and width& of the shoes from the Shoemaker.last as Shoemaker.size = M(Shoemaker.last); Shoemaker.width = M(Shoemaker.last).; These mappings are part of the knowledge incorporated in the mediator.

Figure: Domains: Operations in an ontology algebra.

The crucial ontology in a mediator is created through intersection of source ontologies. Over the intersection we can compute new aggregations and enhance the information content of the data produced through mediation. Having an algebra permits composition over ontologies. We can expect that useful new domains can be created by taking the union of two or more intersections.

By having an algebra on domains that permits creating explicit linkages of joint terminology we also suppress the temptation of creating larger, but less precise ontologies. A base ontology should not be so large to preclude agreement among the people using it in a modest time. If it takes too long, some terms will already have changed in meaning. Support of change is actually a major motivation for formal management of ontologies. Unless the current state and extent of mismatch can be well defined, we will not be able to note the improvements being made as people, by interacting via these information structures, achieve more consensus.

The description in this section ignores a major aspect of operating on ontologies. Early work in that direction is represented by algebras over objects [Barsalou::xx] A major effort to define an ontology is MEDIATORS.Alternatives The functions carried in Mediation are not new, they have been needed for a long time, but were not recognized as being distinct. Alternatives included designing large, integrated systems, often with the help of consultants, since any single organization rarely had the breadth to deal with multiple domains. When the sources and the applications remain distinct, but connect directly, we speak of Client- server systems. When the roles are less well-defined we have open systems.


Mediator modules encode the knowledge that is commonly provided by a consultant, but there are two crucial differences. The representation of the knowledge in executable form is not provided by the consultant. Often the application programmer integrates what was learned into the application programs. When these programs actually enter use the consultant may be long gone, or collects another fee for fixing any misunderstandings. In a mediator the provider of the knowledge also provides its representation. That is in concert with
Figure: Consultants: taking an active role.

The second, related difference is that the consultant stays being responsible for the mediator. If the former consultant, now the mediator expert, has not formulated the mediating program right, then it the experts responsibility to fix it. And, since mediation must deal with changing environments, any further changes are also the expert's responsibility. The payment mechanism is also different. In mediation, we expect that small fees will be extracted with every successful use. That changes the motivation for the consultant. A quality product will generate ongoing income. In today's mode the consultant gets a new fee when things go wrong.





Gio Wiederhold: It is immodest to put one's own biography into a book. Normally the author provides to the publisher a biography for the dustcover, so that its role is to keep the book clean. But the topics covered in this book will undergo rapid change, so that keeping it clean should not be a worry.

Gio Wiederhold was born in Italy, during the initial actions of World-War II, and spent the remains of the war time in European countries, while his family was trying to stay out of trouble and danger, and managed to avoid going to school during that time. After the situation settled down, he was shipped to Holland and attended the Grotius Lyceum in The Hague, the Technicum in Rotterdam and the Technical University in Delft, studying aeronautical engineering. !Lab.Elec.Music! During summers he worked selling ice-cream, repairing refrigeration machinery, and shipping out on a Dutch merchant vessel. A summer job in 1957, at the NATO Air Defense Technical Center in Wassenar, Holland, led to him being introduced to computing, first on calculators and then on very primitive computers.

In 1958 he emigrated to the United States, first working for IBM and later as Chief Programmer for the University of California. After teaching one year at the Indian Institute of Technology, he become Director of the Advanced Computer for Medical Experiments (ACME) at Stanford and a lecturer in its recently established Computer Science Department. For the Medical School he developed a time-shared real-time data- acquisition system (ACME). It included a large on-line filing system, and with it he established the Time-Oriented Database system (TOD) for the masses of data being collected. TOD eventually provided nation-wide services for immunolgy patients.

In 1969 he set up Index corporation, developing information retrieval systems for real- estate and performing artists.

In 1974 he enrolled in the new PhD program in Medical Information Science at the University of California in San Francisco. His thesis, completed in 1976, was on [structured Design of Medical Databases]. During his studies he wrote an early textbook "Database Design' [Wiederhold:77,83] which provided a quantitative approach to the topic.

After joining Stanford's Computer Science faculty in 1976, he proposed research combining AI and databases to ARPA, coining the term Knowledge-bases for this combination. Research in that direction led to a number of topics presented in this work, especially the concept of mediation discussed in this chapter. He has remainded at Stanford except for a sabbatical with IBM Germany in 1987, consulting on knowledge-base support for LInguistic LOGic systems (LILOG), and an assignment to ARPA from 1991 to 1994. At ARPA he was Program Manager for Knowledge-based Systems, and had opportunity to interact with colleagues involved in the HPCC program and the NII.

Gio Wiederhold has been recognized by a number of communities. He is a fellow of the (ACMI), the (IEEE), and the (ACM), has received the ACM SIGMOD Contributions award. He was a member of the board of the NSF-established National Center for Geographic Information and Analysis (NCGIA), and much of the material presented in Chapter \P was obtained in their meetings. He has consulted for local companies, for multinational enterprises, for government, and for international organizations, such as the United Nations on information systems for India and China.


Income for mediating services !...!.per-use fee, lease, purchase. maintenance and update costs.

A mediator often creates a derived work from material protected by copyright. This means that the supplier of source data, if such source data was copyrighted, must be reimbursed for every use. Such a reimbursement must come from the owner of the mediator. If use of the mediator is charged, then a portion of the charges can be allocated to the provider of the source material, otherwise such charges must be from the owner's budget.


The concept of mediation combines elements from at least three disciplines: Databases, System analysis, and Artificial Intelligence. There is not yet a specific discipline which deals with the delivery of information at high and integrated level. Concepts such as mediation are intended to form a basis for a new discipline: Integration Science.
Figure: Integration Science: sources of the discipline.


Companies supplying Mediators

This information is based on a survey for Data Base Processing and Design, 1998.
name location product services [ref]
Epistemics Palo Alto CA Infomaster scheduling, resource management [www.epistemics.com]|
Global Infotek Vienna VA systems engineering integration [www.globalinfotek.com]|
iBrain Inc Palo Alto CA decision support software financial services [www.ibrain.com]|
I-Kinetics inc Cambridge MA Databroker infrastructure software [www.i-kinetics.com]|
ISI Marina Del Ray CA SIMS mediator research and development [www.isi.com]|
ISX Westlake Village CA design and implementation intelligence systems, planning and logistics [www.isx.com]|
Junglee Sunnyvale CA Junglee << >> Internet job placement, shopping [www.junglee.com]| << >>
K2 Informatics Bryn Mawr, PA Kleisli genomic information [71072.234@compuserve.com]|
Lockheed-Martin Idaho Technologies Idaho Falls ERIS environmental, chemical data [http://www.ineel.gov]|
Lockheed Martin Management&Data Systems King of Prussia, PA system design Government, education [www.lmco.com]|
MCC consortium Austin TX Infosleuth Technology development for consortium members [www.mcc.com]|
GeneLogic Bioinformatics Berkeley CA OPM Object-based integration of genomic data[www.genelogic.com]|
Netbot Internet shopping [www.excite.com?]
Persistence Corporation, San Mateo CA: Persistence object creation [www.persistence.com]|
Socratix Palo Alto CA genomics, drug development [ |
SST Woodside CA Passgate privacy protection in collaboration [www.2ST.com]|
Tessarae San Jose CA design and implementation integrating heterogeeous information systems [tesserae.com>]|


Previous chapter - Next chapter

List of all Chapters.
CS99I home page.


!Godown == gudang in malay!