Scalable Knowledge Composition (SKC)

This page has been superseded by SKC Project Web Pages

Gio Wiederhold

Computer Science Department, Stanford University

The Scalabable Knowledge Composition (SKC) project is beining initiated to develop a specific approach to resolve semantic heterogeneity in Information systems. The SKC approach requires developing an algebra over ontologies that represent the terminologies from distinct, typically autonomous domains. Intersection will be the most crucial operation since it identifies the articulation (Guha and Lenat's term), namely the terms where linkage occurs among the domains. The intersection, and all other SKC algebra operations will themselves be knowledge driven, using articulation rules. Now source ontologies can be largely maintained autonomously, while the articulation rules will be maintained by groups benefitting from sharing and interoperation among domains. The SKC project will develop methods using the resulting articulation that allow ontologies to interoperate. Now the problem of managing large knowledge bases is reduced to one of composition. No global agreement is needed among maintainers of disjoint ontologies. We believe that this distributed approach to knowledge maintenance is the best (only?) approach to make semantic interoperation scalable. We are convinced that methods to enforce consistency, even if supported by edicts from `higher authorities' to force distinct, autonomous groups to use language coherently won't work. (The French academy can continue to tell us otherwise). The project is hence conceptually quite innovative. SKC also requires the building of some solid demonstrations to show the world the feasibility of this approach. There are many open research questions that we expect to uncover in this process. SKC Icon (ps).

This research effort is to be funded by AFOSR, with cooperation of the DARPA High-Performance Knowledge Base (HPKB) program.

We have received some complementary funding by the Hughes Research Institute, Malibu, CA.

Earlier, exploratory work has been supported by the Commerecenet Consortium.

An introductory SKC presentation was made at the HPKB West Coast Introductory Meeting, held at Stanford University, March 26-27, 1997.

More SKC Information

References:

  1. Gio Wiederhold and Michael Genesereth: "The Conceptual Basis for Mediation Services"; to appear in IEEE Expert, 1997; presented earlier as "Basis for Mediation" in the Proc. COOPIS'95 Conference, Vienna Austria, May 1995,
  2. David Maluf and Gio Wiederhold: Abstraction of Representation for Interoperation; submitted for publication, March 1997.
  3. Wiederhold, Gio: "Objects and Domains for Managing Medical Knowledge"; Methods of Information in Medicine, Schattauer Verlag, Vol.34, No.1, pages 1-7, March 1995; presented earlier in "Proc. IMIA WG6 meeting, 1994,, and seperately, Figure 1, Figure 2, Figure 3, Figure 4.

  4. Wiederhold, Gio: Interoperation, Mediation, and Ontologies"; with the associated viewgraphs;Proceedings International Symposium on Fifth Generation Computer Systems (FGCS94), Workshop on Heterogeneous Cooperative Knowledge-Bases, Vol.W3, pages 33-48, ICOT, Tokyo, Japan, Dec. 1994; to be published in a Springer Verlag volume.
  5. Wiederhold, Gio: "An Algebra for Ontology Composition"; with an associated Figure; Proceedings of 1994 Monterey Workshop on Formal Methods, Sept 1994, U.S. Naval Postgraduate School, Monterey CA, pages 56-61.
  6. Wiederhold, Gio: "The Roles of Artificial Intelligence in Information Systems"; Journal of Intelligent Information Systems, Vol.11, No.1, 1992, pages 35-56.

ABSTRACT

We propose research and tool development to aid in `Building of Foundation Knowledge'. Our Scalable Knowledge Composition (SKC) proposal addresses specifically the management of domain- and application-specific composition and specialization of knowledge. The SKC approach, based on an Ontology Algebra, will also deal with semantic inconsistencies, since even when knowledge ontologies are intended to be reusable, they cannot be expected to be globally consistent. Furthermore, the operations provided by the algebra address the crucial problem of scalability in large-scale knowledge bases.

The ability to compose knowledge from independent sources of SKC also supports the acquisition objective of DARPA's HPKB program. It is easy to motivate small groups of experts to develop ontologies in their specific domain of expertise. It is costly, especially in time, to convene large groups to establish and update broadly based ontologies. Furthermore, larger ontologies often require compromises, reducing their precision. Composition empowers small groups to contribute to large tasks.

Increasingly powerful computers and better processing algorithms will help to establish and maintain large knowledge bases, but equally crucial is improving the management of knowledge and its components. Composability of independently developed chunks of knowledge provides the basis for such management. The SKC project will provide the operations to support composition, as well as intersection and selection of chunked knowledge. An intersection operation permits focusing on critical linkages. Logical partitioning of knowledge into chunks reduces computational complexity by exponential factors, while enabling distribution of computations to be processed in parallel over many processors.

The Ontology Algebra will itself be knowledge-driven, a necessary feature to deal with the complexities and inconsistencies that arise when distinct knowledge resources are merged. By formulating the needed operations as an algebra, SKC provides a sound basis for extensive and incremental knowledge manipulation. The knowledge that will drive the Ontology Algebra is limited to rules that enable articulation, the linking disjoint knowledge resources, and interoperation, the processing of information based on the articulated knowledge. We believe strongly that an disciplined manipulation of knowledge resources will be essential to achieve the needed

  1. Correctness,
  2. Depth,
  3. Maintainability,
  4. Effective use and reuse, and
  5. Scalability of the knowledge resources.
By being able to assign development and maintenance responsibility of manageable chunks to responsible experts, the first three elements can be achieved. Tools, based on the Ontology Algebra will support composition and articulation among such distinctly maintained knowledge bases to achieve the remaining two elements: effective use and reuse, and scalability. SKC uses articulation knowledge to face the realistic issues that arise when joining knowledge bases, as differences in representations, structure, and semantics.

The Ontology Algebra supports the objective of the HPKB program: making much relevant knowledge available to applications, without incurring the problems of building, managing, and maintaining huge, integrated knowledge bases.

Scalable Knowledge Composition

INNOVATIVE CLAIMS

The statement that `Knowledge is power' is commonly accepted, as is its corollary, that more encoded knowledge should make computing systems more powerful and useful [FeigenbaumMN:88]. However, it has been difficult to establish and maintain truly large knowledge bases. More powerful computers and better processing algorithms will help in that direction, but we believe strongly that a complementary approach, supporting disciplined manipulation of knowledge resources will be essential to achieve the following objectives, needed for effective use by real-world customers and their applications:

  1. Correctness, i.e., consistency within a domain, needed to engender trust by the owners of the applications

  2. Depth, i.e., linkage to atomic instances for realism in modeling and processing, so that information for decisions is grounded on a factual basis

  3. Maintainability, i.e., clear identification and enabling of responsible maintainers of the domain knowledge

  4. Effective use and reuse, i.e., the ability to create subsets, intersections, and compositions that do not overwhelm the application

  5. Scalability of the knowledge resources. i.e., the ability by applications to compose chunks and their intersections without limit.

The ontology algebra to be demonstrated in SKC allows knowledge bases to be developed within manageable, limited-size domains. Within a modest domain the high- cost operations to validate local consistency, adequacy of depth, and up-to-date-ness can be bounded. By being able to assign development and maintenance responsibility of manageable chunks to responsible domain experts, the first three objectives listed will be achieved. The remaining two objectives, effective use and reuse, and scalability are enabled by having tools, based on the Ontology Algebra, to build the articulation knowledge needed for the joint use of the basic chunks. These subsets can then be further composed, as shown in Figure 1: Knowledge Composition.

Figure 1: Knowledge Composition (ps).

The articulation knowledge drives the ontology algebra and is distinctly maintained in chunks relevant to interacting domains to deal with the complexities and inconsistencies that arise when distinct knowledge resources are merged [Wiederhold:91]. We observe that, in the past, the database field was able to make progress once an algebra over data had been defined [Codd:1970]. We also observe that partitioning was necessary to make progress with a truly large knowledge base, CYC at MCC [LenatG:90]. However, CYC is still physically integrated and does not possess a formal basis for operations that deal with interactions among its partitions. The application knowledge bases can then be arbitrarily composed, using rules associated with the ontology algebra. Composition and articulation among distinctly maintained knowledge-based resources achieves effective use, reuse, and scalability.

We will make two more points where the SKC approach is truly innovative. Section G will provide the full rationale and show examples to clarify the distinctions.

  1. The distinct knowledge sources do not have to collaborate directly; their collaboration is managed by specialists in joining distinct knowledge bases. We will illustrate below that such functions are already being performed in current enterprises.

  2. The base knowledge resources remain available to the applications for in-depth processing, using the linkages established during the process of determining the articulations. Their use is then equivalent to the delegation of detailed domain tasks to specialists.

The result is an improvement in the scalability, especially of inferencing rules, which increases nearly by the square of the number of knowledge partitions (10 in Figure 1).

Furthermore, the resulting computation, can be naturally distributed over all active partitions (8 in Figure 1), giving a factor of over 500 for this simple case. This assessment is refined in the next section. as point 5..

Scalability interacts with maintenance, another critical factor since maintenance costs of large knowledge bases is likely to be large. We do not have published figures, but observe that general software, encoding knowledge procedurally, experiences life-time maintenance costs equal to or exceeding its acquisition cost. Much knowledge base maintenance leads to growth, because we keep on learning about the world and how to deal with it.

TECHNICAL RATIONALE, APPROACH, and PLAN.

1. Problems to be Addressed

The applied programs which motivate the High-Performance Knowledge Base (HPKB) program, as Dynamic Multi-user Information Fusion (DMIF), Joint Task Force (JTF) C4I crisis management, planning and execution support, Joint Force Air Component Commander (JFACC) air operations planning, Advanced Logistics Program (ALP) support, Battlefield Awareness Data Dissemination (BADD), and Information Gathering, Processing and Analyses to support Crisis Management (Project Genoa) all share a similar base requirement, namely the integration of information from diverse sources. We expect that the necessary communications infrastructure will be developed soon, so that the information can flow over reliable and secure networks to the applications. The SKC proposal addresses the issues when information and knowledge arrives at end-user or intermediate processing nodes, but is semantically heterogeneous.

The SKC proposal recognizes and addresses the following problems:

Solutions to all the problems cited can be devised, and our proposal does not propose radically new solutions. However, SKC presents an approach which provides a scaleable and maintainable architecture to manage these problems, and to manage the knowledge needed. In all these cases scalability is a secondary, but crucial issue. Simple solutions to the problems above will fail if they are not placed into a structure which minimizes their interaction. We assess the scalability problems and benefits further in Section G.5.

2. Approaches

Most knowledge bases in use have been constructed with a specific objective, and have not been easy to reuse. We experienced that problem early on, when a knowledge- base constructed for data-mining in medical records [Blum:80] had to be restructured for visualization of data from the same medical records [deZegherFWBW:88]. For the tasks to be supported by HPKB, the construction of specialized knowledge bases would be an enormous task, both because of their breadth and rapidly changing requirements and settings. Crisis response, as to be dealt with in JTF and project Genoa, is a prime example where we cannot expect to have the leisure to handcraft a suitable knowledge base.

It is possible to write programs and employ thesauri to aid in matching knowledge entries which may refer to identical items that are named differently. The thesauri needed may be automatically generated, by processing documents and noting overlaps. Such systems will create many interesting matches, increasing the coverage, but will also report many false and even ludicrous matches, since they will have a low precision. Some web tools, as Alta-Vista, use such technologies. However, systems which broaden searches with little restraint cause information overload on the customer, and are not suitable in situations with responsible practitioners.

Tools have been developed to merge diverse knowledge bases, with the expectation that more is better [Humphries:95]. However, knowledge from independent sources displays significant differences, as discussed above, so that the aggregation will be inconsistent and error-prone. No central organization can resolve all the differences that will show up, since each resolution requires knowledge about the source domains and their intersection. The SKC proposal introduces a knowledge-driven Ontology Algebra, which is intended to bring formality and generality to manipulatability of ontologies. The Ontology Algebra will allow combining distinct knowledge resource bases into application-specific knowledge bases. We use the term algebra to indicate that the operations themselves will be composable, so that a variety of outputs can be generated, and optimal sequences be constructed. The operations of an algebra will be more disciplined than simply merging the resources, and involve selecting, matching, transforming, and intersecting the base resources to achieve the desired application-specific results. The application knowledge- bases will hence be typically smaller than the union of their resources, but will be more effective and more economic to process. The Ontology Algebra will itself be knowledge- driven, to deal explicitly with the complexities of composing knowledge.

Whereas most work, when merging ontologies, assumes that identically spelled words imply identical concepts, such an approach is unproven and overreaching. This assumption, basic to mathematics, where X=X is a fundamental axiom, is invalid in the real world of natural language, where individuals may express themselves in the most effective manner for them [Garfield:87]. When ontologies are created by a merger which is based on matching spellings, many errors and problems will be found. The result has to be patched until it appears to operate satisfactorily. The errors that occur can be on several levels.

  1. Fully Different: for instance, NAIL, SINKER, BOLT, MITER, PLANE, TABLE in carpentry versus NAIL in anatomy, SINKER in fishing, lightning BOLT in weather, MITER in religion, air PLANE in logistics, and TABLE in databases.
  2. Different in Coverage: for instance, NAIL in popular household usage versus BRAD, SINKER, BOXNAIL, etc. as used by carpentry experts. Domains need different granularities of abstraction in language to be economical.
  3. Fully Alike, i.e., Synonyms: for instance, TABLE and RELATION in databases. These will lead to merging failures. Pure synonyms are actually quite rare, again since language is economical. For instance, for database specialists, a TABLE signifies a mathematical bag, whereas a RELATION is a mathematical set.

As Michael Lesk [Lesk:90] has pointed out, confusion over the meaning of words is actually quite rare when the context is known. Again, economy of language prevails, so that terms used by experts within a domain are unambiguous. When domains get broader -- say when moving from carpentry to general household repairs, the vocabulary broadens in coverage, but lessens its precision (as the NAIL example), at the same keeping its size manageable by moving up to a higher level of abstraction [Garfield:87]. We hence assume that modest ontologies contributed by domain specialists will not exhibit internal inconsistencies, and that the types of problems we show only occur when ontologies are combined to serve broader application goals. However, we cannot restrict ourselves in the HPKB environment to small or domain-limited ontologies. Military planning tasks require the processing of information from many sources within and outside of the military establishment, so that inconsistencies as presented above will be common.

As long as the processing of military information, and especially its fusion, was the task of human experts, context was implicitly recognized and most errors were easily avoided. However, performance demands that more and more military information be automatically processed. Then these problems will become explicit and obvious. Providing the needed operations for automatic processing of knowledge that is inconsistent will be the primary objective of SKC.

3. The SKC Approach

> In our Scalable Knowledge Composition (SKC) proposal we make two assumptions, both of them quite conservative (and hence lowering our research risk).
  1. Within the context of a specific domain base ontology resource, all terms are consistent.
  2. No term from one domain ontology matches a term from another ontology, unless a matching rule has been provided.
We will illustrate some of the matching rules we expect to be able to handle and then describe how they will be used in our ontology algebra. Many matching rules will have to state the obvious, since we assume that words from a different context domain do not refer to the same class, unless a matching rule exists:.
	HOUSE (carpentry) = HOUSE (householder) 
	TABLE (carpentry) = TABLE (householder)
Such rules are needed if the application deals with houses and their furniture in both the owner’s and in a maintenance context. That application will probably not need a rule
	SALARY (carpentry) = SALARY (householder) .
The house maintenance application may need the rule.
	SINKER(carpentry) in NAIL (householder)
denoting that the householder will refer to large nails (SINKERs) as NAILs. However, the BRAD (CARPENTRY), not used by the householder, may not need to be so defined. The matching rules themselves may be bounded within an articulation context.

We will use another example, that of purchasing goods for a department store, where the goods are in the context of wholesale purchasing. Here the domain experts, defining the articulation knowledge, are the purchasing agents, who all should be members of the American Society of Purchasing Agents (ASPA). For SHOE purchases there will be the matching rule:

	SHOE (store) = SHOE (factory) 
The terms SHOE are not terminals in the store and factory ontologies, but rather the roots of subhierarchies, as PUMPS, WEDGIES, LOAFERS, SANDALS, etc.
	{PUMPS, WEDGIES, LOAFERS, SANDALS}(store) ISA  SHOE(factory)
Not all terms in the subhierarchies will match, so that the ASPA will have further rules, such as,
	PUMPS (store) = SHOE(factory)  if  HEEL(factory) > 5cm. 
Parts of shoes, needed to specify purchases, as HEEL, will also appear in the algebra's rule base; but by the time the NAIL(factory) is reached, no matching rule is needed. Confusion with NAIL(anatomy), which may be an entry in the shoe store’s ontology, is hence avoided. We have already an ontology for anatomy, if needed in the store for articulation with health problems due to high heels, provided by the College of Pathologists [ACP: 92].

The fact that a term does not appear in the articulation intersection does not mean that it is not accessible for domain-specific computation, using methods which are available to the application (that's why there are double-headed arrows in Figure 1). For instance, NAIL (factory) may be a term used in a computational method DURABILITY whose execution remains local to the factory system, but which can be invoked from the articulated result, if there is a matching rule, say Another concept entry that needs to be matched among factory and store is the SIZE of a SHOE. Here a table matching alternate size standards as well as a conditional to select a standard may be needed.

    if (LOCATION (factory) =`EUROPE') 
			then SIZE (factory) = SIZETABLE (SIZE (store)); 
			else SIZE (factory) = SIZE (store); 
Color specifications will certainly need a table, since the factory is likely to use a code
	COLORCODE (factory)= colortable (COLOR (store)), 
allowing the store to refer to COLORCODE `XY14WZ' as `Spring Pink'. These examples should illustrate both the need for an Ontology Algebra and the approach needed to achieve its implementation. The number of rules needed to purchase shoes will be modest. Otherwise purchasing agents today would already have an impossible task.

Maintenance of the rules, to deal, say, with changes in shoe fashions is now assigned to ASPA rather than to the factory in the store. Stores could, of course, add their own mappings, a local issue, but one that can still be aided by having the ontology algebra. In general, subsidiary entries of shared entries, as SIZE and COLOR for SHOEs are candidates for sharing as well. The observation provides the basis for tools that aid in the creation and a maintenance of the articulation algebra.

4. An Ontology Algebra

We now have introduced the general SKC approach, the problems to be addressed, and the knowledge needed to achieve the goal of dealing with information from distinct domains. The crucial features needed for the Ontology Algebra should now have become obvious: operations that manipulate knowledge, using articulation knowledge which specifies relationships among knowledge collections. We expect that articulation knowledge can be sparse, since only relationships relevant to an application domain must be described.

> The competency of an ontology depends on its depth and relevance [GruningerF:95]. An algebra can combine these competencies for distributed execution, but the articulation points have to located, joined, minimized, and made accessible. The ontologies can be completely distinct, overlapping, or be subsets of each other. Each relevant set has to be computable.

The ontology algebra will need to include the common Set operations among knowledge- bases, or rather their ontologies:

  1. Union(A,B) or ( A U B ): the collection of all unique entries in A and B
  2. Intersection(A,B) or ( A ^ B ): the collection of all shared entries in A and B
  3. Difference(A,B) or ( A - B ): the entries in A that are not shared with B.
We plan to implement difference rather than negation to be assured that no sets of near-infinite size can be produced.

The essential innovation in SKC is that the definition of shared is determined by rules in the articulation knowledge bases, and not by syntactic match. Examples of the rules were presented in the examples above. Whenever entries match, their dependent entries will also be investigated.

To support the base operations of the ontology algebra we also need some infrastructure operations, which will access different knowledge representations. These mapping operations will have to be specific to the underlying representation, so that as little loss of functionality of the resources as is possible is incurred.

We do not expect to have to map executable codes, only their names. Our algebra is hence an ontology algebra, rather than a general knowledge algebra. We are not hopeful that we can describe arbitrary methods in knowledge-based information systems in a way which permits formal, scaleable, and reliable manipulation. Any execution of available methods must be carried out in the local domain resource, since semantics embedded in executable code cannot be expected to survive a transfer into another context. Methods that perform remote update of resource knowledge bases must always be executed in a local context, so that side-effects are consistently managed. This approach enables safe update, within the limits of authorization. Articulation knowledge can be similarly updated. We do expect. However, that most updates will be made locally, by the owners of the knowledge bases. Remote execution is well supported by modern computing technology [Burback:96], although the problems of remote service execution in differing context are just now being addressed [HowieKL:96].

We must note that an ontology is more than a collection of terms. It also includes definition of relationships, constraints, and behaviors [GruningerF:95]. Whenever a term is matched, their subsuming, subsumed, and otherwise related terms also become candidates for matching. We will further develop a tool for the creator and maintainer of an articulation knowledge base that provides suggestions and guidance. However, it is quite unwise to automatically move all candidate terms into the articulation knowledge base, since that would

  1. violate our conservative assumption of domain uniqueness and

  2. tend to make the articulation knowledge base larger, counteracting our objective of keeping all knowledge bases small and easy to maintain.

5. The Scalability Problem and Effect of the SKC Partitioning.

We consider scalability, together with maintainability, the most crucial issue facing knowledge-based approaches. System that work well in practice have been of modest size [FeigenbaumMN:88]. Analysis of practical applications rarely show more than a few hundred inferential rules, although those may be complemented by many ground (database) instances. They demonstrate the power of the knowledge paradigm, but not its scalability. Research to improve the scale of knowledge-based processing by moving those ground instances into conventional or specialized, high-performance databases is important, and will actually clarify what is a ground knowledge and comprises the inferencing part [KarpP:95].

Most procedures used in AI have greatly increasing costs as the number of inferencing rules (N) increases, since all (N2) combinations must be investigated. That factor increases when more complex relationships are needed. If we can partition a problem into P chunks then the processing cost becomes on the order of O((N/P)2+P2). If the partitions are hierarchically structured the cost becomes O((N/P)2+P). For P=10 and N=250 the ratio is about 80 for the general partitioned alternative. For P=20 and N=2500 this ratio is nearly 400. The benefits keep increasing by O(P2) as the number of partitions increases; however, we wish to keep the number of knowledge resources used by any applications in bounds.

Further benefits ensue by being able to ignore all knowledge outside of the intersections described by the articulation rules. In Figure 1 the resource D, and the articulation rules for the intersection ( C U D ) will not participate in the application being served. In broadly based systems, i.e., systems serving many applications, these savings will be significant.

Furthermore the resulting computations can be naturally distributed over all active partitions (8 in Figure 1), giving a factor of over 500 for this simple case. There will be human benefits that at least equal the quantified improvements. It is difficult for a human to understand reasoning covering more than a dozen items and more than five plies deep. Since the articulated structure built during knowledge composition can be reported, it is possible to trace any unexpected result to its sources and locate the source or the specific interaction. Systems that cannot present their reasoning paths clearly will require much human effort to understand issues when they do not perform as expected, or be rejected if the investigation effort seems excessive.

Scalability interacts with maintenance, since maintenance of systems beyond their design size becomes increasingly costly. Design size is not simply a size metric, but must also consider complexity. Arbitrary structural interactions increase the complexity of maintenance exponentially, since the effects of any change spread wider and wider. Most working ontologies are hierarchical, to permit growth. But limiting larger domains to hierarchies ignores the needs of intelligent applications, who cannot be limited to simplisctic views of the world.

6. Maintenance

For long-lived knowledge-based systems continuing maintenance is essential. Our knowledge of the world changes over time, and systems that do not provide for maintenance will be short-lived and not repay their investment (we have seen such systems in Artificial Intelligence). As discussed above, maintenance costs of large knowledge bases is likely to be huge. We do not have published figures for knowledge bases, but observe that general software, encoding knowledge procedurally, experience life-time maintenance costs equal to or exceeding their acquisition cost. A secondary factor is that much knowledge base maintenance leads to growth, because we keep on learning about the world and how to deal with it.

The model used by SKC provides for distributed, domain-specific maintenance by experts. In the shoe example, we find maintenance being needed at the factory level, at the store level, and at the purchasing agent level.

In the factory example, ontologies may be maintained as two layers: first, all factories that are members of the International Shoe Cartel share a common base ontology; but secondly, specific factories may add terms for more specialized needs. A similar layering will exist in the store. There will be many the purchasing of many flavors of goods, and the managers also have to deal with sales quotas, personnel, real estate and the like. The articulation knowledge, exploited by the Ontology Algebra, is maintained by a third unit, the ASPA affiliated purchasing agents, again possibly with a local extension. Note that the purchasing agents will not be tempted to create rules merging the personnel or real estate concepts of the factory and store ontologies. Those are irrelevant to purchasing.

In an articulation to support taxation, with rules produced by the tax lawyers, one may find some rules for shared concepts for real estate. Not only the knowledge resources are partitioned, but their processing and inferential rules are partitioned as well. It is not necessary to build overly fancy processing modules to take care of cases that are beyond current concerns.

While this maintenance scheme looks complex, it actually focuses authority and responsibility. The need of committees, assembled from experts from many domains, and the resulting delays and compromises is avoided. Since maintenance cost is often 60% to 80% of computer systems costs [ref:xx], explicitly dealing with organizational precepts and tools for making maintenance efficient is wiser than sweeping the issue under the rug.

7. Efficiency In Processing

Knowledge extracted for articulation from multiple resource sets may be merged, creating larger, but yet modest knowledge bases. Little processing will be wasted wading through irrelevant information. Since many intelligent processing schemes in artificial intelligence, even when heuristics are employed, have high order polynomial factors in the size of the knowledge bases, such an economy is essential to obtain effective processing. We have observed few systems using several hundred inferential rules; that is, rules that are not grounded in static, factual atomic values.

Since the processing is partitioned, the computational load can be distributed over the nodes supporting the base resources, the nodes performing the algebra on selected knowledge classes and their instances, and the nodes presenting the integrated information. Such a natural distribution is likely be to more effective than approaches where massive, centralized computations are defined initially, which are then analyzed to infer possible parallelism. However, when computations are initially massive, they are likely to have internal linkages and shortcuts, defeating parallelization, unless the computations are restructured. SKC programs are naturally from certain unstructured shortcuts, but the result is immediately suitable for parallel execution. Binary ground instances is concentrated in base resources, so that maximal distribution occurs, at the layer corresponding to the greatest volume of processing resource.

Conclusion

The ability of the SKC approach to focus on relevant knowledge, and to perform distributed search, models the capabilities of effective human analysts, who will focus on relevant connections, and only when those connections seem promising, will drill down into the detailed instances [Fodor:83]. SKC provides tools, based on an algebra that manipulates the ontologies of the information resources. The partitioning that is supported matches modern, distributed organizations which depend already on delegation of authority and responsibility. Well- structured delegation enhances longevity through effective maintenance. Having a large, but poorly maintained and error-prone system is worse than having a smaller up-to-date system. SKC supports large-scale integrated inferencing over small knowledge modules, by always selecting relevant knowledge, and placing it in a hierarchical execution model.

Problems to be addressed when joining knowledge bases are differences among them in representation, structure, and semantics. Many of these differences are legitimate, and reflect differences in context, objectives, and tools used by the contributors. Imposing top-down standards would disconnect contributors from their interests and productive approaches. SKC faces the issues raised by these realities by introducing articulation knowledge. Articulation knowledge captures the experience of an integrator for re- execution and reuse in alternate configurations and applications. SKC provides a partitioning for effective, distributed maintenance. Source ontologies are best maintained locally by domain experts. Articulation knowledge is partitioned as well, and maintained by integrators. Tools for constructing articulation knowledge will exploit the base knowledge resources, in itself an example of knowledge reuse.

We hence can use distributed computations, or rather their results, to minimize the effect of semantic heterogeneity [Nadis:96]. The demonstrations in SKC will show that arbitrary knowledge bases, as needed by applications, can be composed from independently developed ground resources. The partitioning can also provide isolation, when needed for security purposes. Strictly, no active interaction with base resources is required to build the articulation. Updates of the base resources will be available to the application as they are made. Within this proposal we do not expect to profit from this capability, but we may be able to exploit within another DARPA funded project on Survivability Access Wrappers (SAW) [WiederholdBCS:96].

A crucial feature of the approach in SKC is its scalability. SKC achieves large-scale operation not by massiveness, but by being able to focus on relevant information. Intersection operations extract only the knowledge needed for articulation, typically a modest fraction of the base knowledge. Union operation allow the merging of knowledge-bases for joint access, without requiring deep consistency. Base knowledge remains accessible when needed to compute or expand selected, relevant instances. Methods remain executable in base knowledge bases, assuring correct context for their execution. While SKC will not directly focus on high-performance computing or communications technology, its underlying structure supports advances in these areas. Parallel and distributed multi-computer operation are natural implementations for production systems following the SKC paradigm and employing the discipline of an Ontology Algebra. Transmission volume is minimized when articulated intersections of base knowledge are passed towards the applications.

More viewgraphs.

Gio Wiederhold, email to: gio@cs.stanford.edu.
secretary: Marianne Siroker
Gates Computer Science Bldg. 4A, room 436
email to: siroker@cs.stanford.edu.