[This version is obsolete. Newer version available at http://www-db.stanford.edu/~melnik/pub/sw00/]

A Layered Approach to Information Modeling
and Interoperability on the Web

Database Group, Stanford University
{melnik,stefan}@db.stanford.edu

Abstract

On the Semantic Web, the target audience are the machines rather than humans. To please this audience, information needs to be presented in a machine-processable form rather than as plain text. A variety of information models like RDF or UML are available to fulfil this purpose, varying greatly in their capabilities. The advent of XML leveraged a promising concensus on the encoding syntax for machine-processable information. However, translating between different information models on a syntactic level proved to be a laborious task. In this paper, we suggest a layered approach to interoperability of information models. We argue that an object layer that fills the gap between syntax and semantics can facilitate information model interoperability. We discuss the key features of object layers like identity and binary relationships, basic typing, reification, ordering, and n-ary relationships, and examine design issues and implementation alternatives involved in building an object layer.

Keywords: modeling, interoperability, object layer

1 Introduction: A Web for Automated Agents

The Internet and especially the World Wide Web are growing at a tremendous rate. More and more information is becoming directly available for human consumption. But humans have a limited information processing capacity and are often not able to find and process the relevant information in the time available. Automation, in the form of information-processing agents [GeK94] that provide an extension to human capabilities, is required. However, with the current technology it is difficult and expensive to build such automated agents that support human users with their information processing power, since agents are not able to understand the meaning of the natural language terms found on today's webpages.

Current software agents extract information from webpages using wrappers [SaA99]. Wrappers are site-specific software modules that extract information based on the regular structure of webpages. But wrappers are expensive to built and difficult to maintain, since they require manual work, and an automated agent can not access information before a wrapper is (manually) created. Hence, the agent can not just browse the web on its own and explore and use the found information. This explains the current very limited capabilities of automated agents browsing the web: semantic interoperability on the web is currently very limited. To really facilitate automated agents on the web, agent-interpretable formal data is required. However, it is not immediately clear what kind of data and and what tools are necessary to support creating and deploying agent interpretable data.

A number of standards and efforts are targeted at populating the Web with machine-processable information. Tim Berners-Lee coined the term "Semantic Web" to refer to this new, richer Web, in which automated agents can help humans more effectively with achieving their tasks.
The machine-oriented web will contain – much like the HTML-oriented web today – vastly diverse information. Even a human Web user nowadays will find a lot of information contained on the Web incomprehensible (eg. a psychologist confronted with details of quantum mechanics, or a lawyer looking at a page on molecular biology). Similarly, we can hardly expect that an automated agent will a priori be able to "understand" all information available on the Web.

If needed, however, a human being can figure out the meaning of the terms used by other people, and make use of the previously incomprehensible information. Automated agents need a similar capability. Humans could then delegate to them a lot of boring and time-consuming tasks like finding an inexpensive flight, filing a tax return, or locating an electronic version of someone's publication. The lack of "common understanding" is a major obstacle on the way to building the Semantic Web. Various representation technologies have been developed that allow capturing information in machine-processable manner. However, applications that deploy different information models often cannot interoperate effectively. Establishing interoperation is a complex task, with lots of special-case solutions. Solving the interopability problem on a broad scale, as required for an agent-enhanced web, requires a novel approach.

One of the most urgent needs is the reduction of the complexity of the problem. A common strategy for complexity reduction is the identification of abstraction layers, which hide the complexity of the levels below them and make problems manageable. A layered approach to the Semantic Web and a set of common interfaces and tools to specific layers enables replacing a concrete realization of one layer with another realization of that layer (comparable to exchanging the physical layer in a TCP/IP network, e.g. exchanging a token ring by Ethernet). Such a layered approach would enable setting up agents-based services on top of existing services, and reusing the tools developed for the lower layers. This paper suggests such a set of layers, which provide the necessary abstractions for managing the complexity of interoperability on the Semantic Web.

The rest of the paper is organized as follows. In the next section, we briefly introduce several modeling languages used for illustration throughout the paper. Then we give a motivating example. In Section [4] we describe a reference model that we use to structure disparate Web-enabled information models. After that, we discuss the features and implementation of the object layer in more detail. In Section [6] we will revisit our motivating example and demonstrate the benefits of using an intermediate object layer.

2 Data Modeling Languages for the Semantic Web

Throughout the paper we discuss several data representation languages possible for the Semantic Web (please note that the selected list is by no means exhaustive). However, we believe the list below contains the major types of approaches for data representation languages for the Semantic Web:

OEM (Object Exchange Model) [PGMW95] is a data model developed for Information Integration Projects at Stanford University.
RDF and RDF Schema: RDF (Resource Description Framework) [LaS99] and RDF Schema (the Schema Language for RDF) [BrG00] are W3C Recommendations for describing metadata on the web.
SHOE (Simple HTML Ontology Extensions) [HHL99] is an extension to HTML which allows web page authors to annotate their web documents with machine-understandable content.
UML (Unified Modeling Language) is the industry-standard language for specifying, visualizing, constructing, and documenting the artifacts of software systems.
OIL (Ontology Inference Layer) [FHH+00] is a language defined on top of RDF for enabling more expressive ontology definitions.

In the next sections we discuss these modeling languages in more detail.

OEM

OEM is one of the first and simplest information models that have been proposed for exchanging information on the Web. The main features it offers are object identity and nesting. The OEM model is a directed labeled graph, in which every object has a distinct identity and a type. Apart of atomic types like integers and strings, OEM supports sets and lists (sets and lists correspond to container types in RDF, see below). OEM object graphs can be represented in a graphical notation and serialized using a simple text-based syntax. This syntax is not based on XML; OEM was suggested before XML became available.

RDF and RDF Schema

RDF is a data model for representing data on the Web. RDF defines the following modeling primitives:

Object identity: RDF distinguishes between resources, which have object identity (an OID), and literals, i.e. opaque strings. An OID is represented by a URI, a Uniform Resource Identifier (URIs are generalized URLs). A URI does not necessarily address a resource on the web. For example, the International Standard Book Number 0679405739, which identifies a 1992 edition of the novel "War and Peace", can be used as a URI "ISBN:0679405739". Thus, real world objects are represented in RDF using surrogates, or symbols that are associated with these objects.
Binary relationships: Relationships in RDF are modelled via binary relations. Thus RDF models have the form of a directed graph.
Reification: In RDF a specific binary relation between two objects is called a statement.To allow statements about statements, a statement can be reified, that means expressed by another objects with a certain set of properties. The object is a placeholder for the original statement, and hence can be used to make statements about the original statement.
Container: RDF defines specific container types representing sequences, alternatives, and multisets.

RDF Schema is the Schema language for RDF. RDF Schema is similar to frame-based languages. The main modeling primitives defined in RDF Schema are:

Classes: RDF Schema allows to define an explict hierarchy of classes. A class is a resource and has a unique ID. The subClass relationship is defined by the property subClassOf.
Property and Property Constraints: RDF Schema has modeling primitives for defining property constraints that restrict the range and domain of a property to certain classes.

Notice that RDF Schema is defined on top of RDF. RDF itself does not depend on RDF Schema.

SHOE

SHOE is an ontology-based knowledge representation language providing annotations for HTML pages. SHOE defines the following modeling primitives:

Categories, which are similar to RDF-Schema classes, are defined with a <def-category> tag and may specify one or more subsuming categories (superclasses). Categories can be used to built term taxonomies, by defining a hierarchy of child-parent relationships.
Relations: SHOE contains means to define n-ary relations, defined with a <def-relation> tag, which also contains type definitions for each argument.
Ontology Primitives: Special modeling primitives are aiming at ontology administration: Ontology extensions are expressed in SHOE with the <use-ontology> tag, which contains the identifier and version number of the intended ontology. The <use-ontology> tag also allows to define a URL attribute (pointing to the ontology definition), and a prefix attribute used to define a local identifier for terms. The <def-rename> tag allows to define a renaming for a concept defined from another ontology.
Inference Rules are defined using the <def-inference> tag to supply additional axioms. SHOE axioms are equivalent to definite Horn Clauses (a subset of First Order Predicate Logic, which resembles if-then rules).

UML

UML has been designed as a modeling language for software-intensive systems. UML possesses a comprehensive logical foundation. For this reason, it has been deployed for modeling tasks significantly broader than software engineering. In brief, UML has the following features relevant for our exposition:

Abstract notation is a human-readable graphical notation of models. A UML model is comparable to a schema, or ontology. An instance of a UML model comprises objects that participate in various relationships with each other.
XMI serialization: XML Metadata Interchange (XMI) is an XML-based encoding standard for UML models.
Object Constraint Language (OCL) is a formal language used to specify well-formedness rules for models. OCL can be compared in style with XML DTDs. DTDs constrain the number of possible valid instances of XML documents, whereas OCL constains the number of possible valid instances of UML models.
UML CORBAfacility is a programming language-neutral API for manipulating UML models. It defines classes like Classifier, Method, Generalization etc. with corresponding access methods and properties.
Four-layer metamodel structure: the architecture of UML is based on a four-layer structure. The four layers are: user objects, model, metamodel, and meta-metamodel. In short, descriptions of models belong to higher levels of abstraction than the models themselves. That is, user objects, models, metamodels and meta-metamodels live in disjunct "worlds", and cannot directly relate to each other. UML is primarily concerned with the metamodel layer, which is an instance of the meta-metamodel layer. The meta-metamodel layer is defined in a separate standard called MOF (Meta-Object Facility). MOF is hard-wired, i.e. its semantics is considered to be well-known. MOF's modeling primitives include MetaClasses, MetaAttributes, MetaOperations etc.

UML defines a rich set of models for virtually all aspects of software engineering. These models are organized in packages and comprise interfaces and components, distribution and concurrency, activity diagrams, patters and collaborations etc. Features comparable to RDF's object identity and relationships are defined in the UML package Behavioral Elements/Instances and Links. As another example, the package Foundation/DataTypes defines the data types like Integer, Boolean, String, Time etc.

OIL

The OIL language is based on Description Logic (DL)-oriented ontologies and has a well-defined first-order semantics and automated reasoning support, e.g. for class consistency and subsumption checking. OIL supports the following modeling primitives:

Ontology Metadata, based by Dublin Core Metadata Element Set. Ontology definitions consist of an optional import statement, an optional rule-base and class and slot definitions.
A slot definition (slot-def) associates a slot (a binary relation) name with a slot definition. A slot definition specifies global constraints that apply to the slot relation. A slot-def can consists of a subslot-of statement, domain and range restrictions, and additional qualities of the slot, such as being inverse, transitive, and symmetric.
A class definition (class-def) associates a class name with a class description. Sophisticated class definitions (e.g. equivalence between classes, i.e. renaming, or subclass-statements) are expressed as boolean combination of class expressions using the operators AND, OR or NOT. Slot-constraints, which relate a class to a certain property (or slot), are also class expressions. Possible slot-constraints are:

has-value: Every instance of the class defined by the slot constraint must have a value for its slot which is an instance of each class-expression in the list.
value-type: If an instance of the class defined by the slot-constraint is related via the slot relation to some individual x, then x must be an instance of each class-expression in the list.
max-cardinality: An instance of the class defined by the slot-constraint can be related to at most n distinct instances of the class-expression via the slot relation (similar are min-cardinality and, as a shortcut, cardinality).

The syntax of OIL is oriented towards XML and RDF. [HFB+00] defines a DTD and a XML schema definition for OIL and [BKD+00] defines OIL as an extension of RDFS.

2 Motivating Example

In the previous section, we briefly presented several information models. RDF, UML, SHOE, and OIL, to a greater or lesser extent, support specification of ontologies. As an example, let us consider that we want to exchange a trivial ontology about food between applications that deploy RDF, UML, and SHOE. Thus, an application that uses RDF might state that "food is edible stuff". It wants to convey this information to the applications that deploy UML and SHOE (OIL is compatible to RDF Schema). Notice that all three models support some notion of classes and generalization. However, they use slightly different terminology. RDF Schema talks about "classes" and "subclasses of". UML deploys the notion of "classifiers" and "generalization". In SHOE's vocabulary, classes are "categories", whereas generalization is denoted using "is a".

Remarkably, RDF, UML, and SHOE have XML-based serializations. Unfortunately, the praised lingua franca of the Web offers only limited help in building a bridge between the modeling approaches used in our example, even if we assume that we have global identifiers for the concepts "food" (e.g. urn:cyc:food:Food) and "edible stuff" (e.g. urn:cyc:food:EdibleStuff). The serialization of the assertion that "food is edible stuff" can be represented in the XML-based syntax for RDF as follows:

R1:
    <Class rdf:ID="urn:cyc:food:Food">
      <subClassOf>
        <Class rdf:ID="urn:cyc:food:EdibleStuff"/>
      </subClassOf>
    </Class>

In XMI, the XML-based serialization standard for UML, the same fact looks rather like:

U1:
    <Foundation.Core.Generalization>
      <Foundation.Core.Generalization.child>
        <Foundation.Core.Class xmi.uuid="urn:cyc:food:Food"/>
      </Foundation.Core.Generalization.child>
      <Foundation.Core.Generalization.parent>
        <Foundation.Core.Class xmi.uuid="urn:cyc:food:EdibleStuff"/>
      </Foundation.Core.Generalization.parent>
    </Foundation.Core.Generalization>

Finally, in the XML syntax of SHOE, our relationship between food and edible stuff is represented as follows:

S1:
<DEF-CATEGORY NAME="urn:cyc:food:Food"/>
<DEF-CATEGORY NAME="urn:cyc:food:EdibleStuff" ISA="urn:cyc:food:Food"/>

Unsurprisingly, all three XML serializations look absolutely distinct. XML is a very flexible language fostering the creativity of independent developers. The current state-of-the-art of translating between the above XML serializations is to use a declarative transformation language like XSLT [XSLT]. For brevity, we do not present an XSLT specification for our example (in fact, it is by no means compact). Instead, we notice that one such specification would not solve our integration problem. The complicating fact is that the syntaxes for RDF and UML allow various representations for the same state of affairs. An alternative serialization in RDF that is semantically equivalent to R1 looks like

R2:
    <rdf:Description rdf:ID="urn:cyc:food:Food">
      <subClassOf rdf:resource="urn:cyc:food:EdibleStuff"/>
      <rdf:type rdf:resource="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Class"/>
    </rdf:Description>
    <Class rdf:ID="urn:cyc:food:EdibleStuff"/>

The XMI serialization for our example could equally well be

U2:
    <Foundation.Core.Class xml.uuid="urn:cyc:food:Food">
      <Foundation.Core.GeneralizableElement.generalization>
        <Foundation.Core.Generalization xmi.idref="S.00001"/>
      </Foundation.Core.Generalization.generalization>
    </Foundation.Core.Class>
    <Foundation.Core.Class xml.uuid="urn:cyc:food:EdibleStuff">
      <Foundation.Core.GeneralizableElement.specialization>
        <Foundation.Core.Generalization xml.idref="S.00001"/>
      </Foundation.Core.Generalization.specialization>
    </Foundation.Core.Class>
    <Foundation.Core.Generalization xmi.id="S.00001"/>

A variety of further valid syntactic alternatives is conceivable. In order to account for all possible syntactic forms, we would have to write a number of long and elaborate XSTL specifications. Thus, the translation task becomes very laborious, if not prohibitive.

The reason for this complexity is the gap of abstraction between object-oriented approaches used in information models like RDF, UML, or SHOE, and the low-level document model provided by XML. Similar gaps of abstraction can be observed in other areas, e.g. compiler technology or internetworking. Compilers for programming languages often use intermediate languages to cope with the complexity of the compilation task and to simplify optimization. Large-scale networks like the Internet are built using protocol hierarchies to reduce the complexity of communication.

In our example, the gap of abstraction emerges when we try to integrate information on the conceptual level using syntactic primitives. To bridge this gap we suggest using an intermediate modeling layer that we call the object layer. In the next section we describe a reference model that we use to structure disparate Web-enabled information models. After that, we discuss the features and implementation of the object layer in more detail. In Section [6] we will revisit our motivating example and demonstrate the benefits of using an intermediate object layer.

3 Information Model Interoperability (IMI) Reference Model

To reduce the complexity of building the Semantic Web, we suggest to view Web-enabled information models as a series of layers, each one built upon the one below it. The purpose of every layer is to offer certain "services" to the higher layers, shielding those layers from the details of how the offered services are actually implemented. We propose to organize Web information models into three layers: a syntax layer, an object layer, and an conceptual layer (see Figure 1).

Figure 1: The IMI reference model

The syntax layer is responsible for representing structured information as byte streams. The basic function of the object layer, or frame layer, is to provide applications with an object-oriented view of their domain. The semantic layer, or knowledge representation layer, deals with conceptual modeling and knowledge engineering tasks. In the rest of this section, we briefly describe each of the layers in a bottom-up fashion, and examine the advantages of the layered approach to information modeling.

Syntax Layer

The main task of the syntax layer is to provide a way of serializing information content into a sequence of characters or bits. In the past two years we have witnessed how an impressive global-scale agreement on a common syntax layer has been achieved. XML has become pervasive, its use ranging from electronic publishing to electronic business.

Any application that exchanges information with other applications or stores it persistently needs to structure the information carefully so that the recipient or reader can retrieve the information in its original form. Data structures used by applications are typically not just flat lists of uniform data elements. Therefore, additional markup mechanisms are required to preserve the nested structure of the data. XML tagging or ASN.1 encoding rules are examples of such markup mechanisms. The syntax layer can typically be divided into three sublayers (bottom-up):

serialization: data instances are serialized into byte streams. For example, XML documents are serialized as Unicode character strings, whereas ASN.1 uses a binary encoding.
generic document models: applications structure information as nested data structures. XML provides a generic document model. Instances of this model can be manipulated using APIs like XML DOM.
restricted document models: sometimes applications want to enforce structural constraints on the nested data structures they use. XML Document Type Definitions (DTDs) are an example of grammars for describing such structural constraints.

Object Layer

The purpose of the object layer is to offer applications an object-oriented view on the information that they operate upon. In contrast to data structures, objects have immutable object identity, i.e. change of an object's identity results in a different object. The very basic function of the object layer is to enable manipulation of objects and binary relationships between them. However, different applications may require more functionality of the object layer, depending on their complexity. We identified four additional sublayers that are often used. In Section [4] we describe the object layer in more detail. Here is a brief summary of the five sublayers:

identity and binary relationships: every object-oriented model provides these features.
basic typing: a simple abstraction mechanism. One object is used to "type" another object.
reification: some information models (e.g. RDF, UML) require access to whole relations and individual links between objects.
ordering: ordered relationships are integral part of some information models (e.g. UML).
n-ary relationships are deployed in information models like SHOE.

Semantic Layer

Roughly speaking, the semantic layer provides interpretation of the object model used in the object layer. This objects, or surrogates, used in the object layer are mapped onto physical and abstract objects like books, airplain tickets, database tables, logical formulae and paragraphs of text. The ultimate goal of the Semantic Web is to make machines interoperate in the semantic layer. The semantic layer is comprised of a number of rich and complex sublayers like:

conceptual models: vocabularies for representing conceptual models (e.g. RDF Schema, UML Foundation/Core)
domain models: deal with ontologies of a particular application domain, e.g. transportation, manufacturing, e-business, digital libraries, Web resources, etc.
languages: instead of using a natural language, the machines on the Semantic Web convey information using formal languages. These languages can be highly-specialized or serve a general purpose. Examples include workflow definition languages, Datalog, first-order logic, UML statecharts, SQL etc. Terms and expressions in these languages are first-class objects that can be manipulated on the object layer. In this way, applications can dynamically learn the semantics of previously unknown languages.

Please note that since knowledge representation requirements are diverse there will usually be multiple semantic layers on top of a single object layer. For instance, languages defining workflow models can coexist and interact with languages defining ontologies. Both languages can reuse the shared object layer, and establishing interoperability between application would mean to bridge the semantic layers only. Using a conventional, not explicitly layered approach, establishing interopability involves dealing with the other layers as well, since they are implicitly encoded in the language structures.

Rationale of the Layered Reference Model

The motivation behind our layered approach is to provide clean interfaces between the layers. This allows making implementations of the layers easily replaceable. On the other hand, a layered approach facilitates reuse in information modeling. For example, the semantic layer of an emerging information model can be built on top of the object layer of RDF. [Mel00] demonstrates how the semantic layer of UML can be built on top of RDF, and [BKD+00] defines OIL as an extension of RDFS on top of the object layer of RDF. Such reuse eliminates additional effort for reinventing the wheel (e.g. defining yet another syntax) and boosts interoperability. For example, many information models adopted XML for their syntax layer and are able to reuse XML tools and parsers developed by third parties.

The IMI reference model that we suggest borrows heavily from the Open Systems Interconnection (OSI) model used in the internetworking. In OSI, the service definition tells what the layer does, not how entities above it access it or how the layers works. Just as the physical layer in OSI is concerned with transmitting raw bits over a communication channel, the service offered by the syntax layer in IMI consists in marshalling and manipulation of nested data structures. A layer's interface tells the applications above it how to access it. It specifies what the parameters are and what results to expect. Broadly accepted interfaces for the syntax layer are XML DOM and SAX. The interface says nothing about how the layer works inside. The peer protocols used in an OSI networking layer are the layer's own business. Similarly, the implementations of a layer in IMI can differ. It can use any implementation it wants to, as long as it gets the job done (i.e., provides the offered services like serialization or n-ary relationships).

Thus, one of the major criteria for splitting an information model into layers is whether it is possible to define a clean interface between the layers. Some of these interfaces are already in place, like DOM for the syntax layer or the UML CORBAfacility for the object layer. We merged all features that cannot always be separated in a well-defined manner into a single layer. These distinctive features form sublayers. For example, in the object layer, sometimes ordering may use reification, and sometimes it can be built on top of n-ary relationships. Thus, there can exist mutual dependencies between sublayers in different information models.

To summarize, layering representation languages has the following advantages:

Defining interoperability between different semantic layers that use the same object layer is simplified. Since the basic formalisms like binary relationships are shared, syntactic mappings are not necessary.
If different object layers are used, they need to be integrated only once for all mappings on the semantic layer.
Tools for parsing (the syntax layer) and querying (the object layer) can be shared, thus reducing the costs of application development significantly. For example, OIL reuses RDF parsing and querying tools.

4 Object Layer: Features and Design Issues

The object layer is the focus of our paper. In this section we discuss the five sublayers of the object layer in more detail. We illustrate the features of these sublayers using examples from RDF, UML, SHOE and OEM. The purpose of our discussion is to gain a better understanding of the design issues involved on the object layer. Such understanding is beneficial for the specifications of the mappings between object layers in different information models. Ultimately, we hope it can contribute to an agreement on the capabilities of object layers similar in scale to an agreement on the syntax layer (XML).

In the discussion of the object layer we are again following a bottom-up approch, i.e. from the ground-level features to more high-level ones. The design issues that we consider in this section have a logical character. They do not necessarily preclude the variety of implementation alternatives at the programming level. Nevertheless, a logical implementation can have major impact on the API design. In Section [5] we examine some programming-level implementation issues.

4.1 Identity and Binary Relationships

Object identity and binary relationships can be seen as the least common denominator between any two object-oriented models. A model lacking object identity is simply not an object-oriented model any more [Cat91]. In every object-oriented model, objects do not exist on their own, but engage in multiple relationships with each other. Binary relationships is the simplest form of such relationships. Notice that at the object layer we deal with objects at the instance level, i.e. every object is treated as an individual identifiable entity.

As long as an application does not need to exchange information with other applications, is does not matter how the objects are identified. In fact, suitable object-oriented APIs may hide the object identity from the programmer completely. To take advantage of the Semantic Web, applications need to communicate, either directly or indirectly by publishing information in machine-readable form. Thus, explicit identifiers for the objects are required. In RDF, objects are identified using the Uniform Resource Identifiers (URIs), a generalized form of Uniform Resource Locators (URLs). A similar approach is taken by SHOE. In UML, objects are identified using Universally Unique Identifiers (UUIDs). OEM allows any unique variable length identifiers. URIs, UUIDs etc. support global identity for the objects, which is a prerequisite for building the Semantic Web.

Information models use different abstract notations for binary relationships between objects. In this paper, we adopt the RDF notation. Figure [2] illustrates a binary relationship between a source and a destination object. As a rule, the position of the object in a relationship, i.e. source or destination, is significant. In RDF, every such relationship is viewed as a statement, or assertion. The source and destination objects are the subject and the object of the assertion, respectively.

Figure 2: Abstract notation for a binary link between two objects

In UML, relationships between object instances as shown in the figure are referred to as links. The relationship types, i.e. the relationship as a whole in the semantic layer, is called association. To avoid ambiguity, we follow this terminology.

4.2 Basic Typing

Information models sometimes deploy a primitive typing mechanism to differentiate the objects among each other. Another object is used to denote the type of the given object. We refer to such mechanism as basic typing. The semantics of basic typing is broader (less strictly defined) than that of instantiation; if two objects are of the same type, they can be used in a similar context within the application.

In OEM, basic typing is used to denote atomic types such as integer or string, and container types such as set or list. Since the "types" themselves are first-class objects, the application can request additional information about the types.

In RDF, the purpose of basic typing is to allow bootstrapping of more complex typing facilities in the semantic layer. In the notation used above, basic typing of an object A using object B is represented as an arc from A to B with a label like type that denotes basic typing.

4.3 Reification

To refer to individual links between objects, or to associations (relationships as a whole), a reification mechanism is required. Reification is latin for "making into a thing". Using reification, links and associations can be treated as first-class citizens in an information model. Associations are reified to enable applications to provide additional information about them. Reification of links can be used for multiple purposes. For example, in RDF every link corresponds to an assertion. Thus, reification of links provides a "quotation" mechanism, i.e. an application can refer to information stated by another application. Being able to discuss, dispute or support the relationships and properties of objects is a crucial prerequisite for machine communication. On the other hand, reification of links can be used to implement nesting of instances of information models.

Reification of links and associations is illustrated in Figure [3]. The big oval denotes the object that represents the reified link. In the figure, this object is used as the source for another link. To emphasize reification of the association in the bottom part of the figure, the association is circumscribed as an object. This object can, too, participate in other links. These two kinds of reification provide the necessary prerequisites for computational reflection, i.e. the capability for a computational process to reason about itself [Smi96].

Figure 3: Reification of links and relationships

Both UML and RDF support reification of links and associations. In both standards, link reification is logically implemented by introducing a new object with properties that identify the parts of the link. The logical implementation of link reification in RDF is illustrated in Figure [4].

Figure 4: Logical implementation of reified links in RDF

4.4 Ordering

Some information models like UML make heavy use of ordered relationships. To illustrate ordered relationships, consider the DublinCore association "creator". In Pushkin's poem "Mozart and Salieri", which inspired the movie "Amadeus", both Mozart and Salieri were presented as the "creators" of "Requiem", whereas Mozart is definitely the primary author.

Figure [5] illustrates five logical implementation alternatives for the ordered binary relationship between "Requiem" and the two composers. The right-hand size of the figure presents a "logical" view of the object graphs. The five alternatives are named specialization, container, ordinal properties, ternary, and reification, according to their logical implementation. Notice that although any representation can be bijectively translated into every other one, they are more or less semantically faithful. For example, the second representation (container) is particularly semantically misleading for representing ordered relationships since it states that the creator of "Requiem" is an object typed as Sequence.

Figure 5: Implementation alternatives for ordered relationships

A qualitative comparison of the alternatives is presented in Table [1]. Besides semantic faithfulness, we consider how difficult it is to use the same logical schema for representing the inverse order. Inverse order is required when the objects at the source end that are related to a single object at the destination end have an ordering that must be preserved. For example, if the "creator" association were to capture the chronological order of the pieces written by the composers, representation for the inverse order would be needed. We gave a minus (-) to the schemes that required creation of additional reified objects for links or associations to support inverse order.

Alternative	Semantic faithfulness	Inverse order	Implementation effort
specialization	`++`	`-`	`--`
container	`--`	`-`	`--`
ordinal properties	`+-`	`-`	`-`
ternary	`+-`	`+`	`+-`
reification	`+`	`+`	`+`

Table 1: Logical implementation alternatives for ordering

Finally, the last metric that we consider here is the implementation effort. By implementation effort we mean not the effort needed to implement the API that allows manipulating ordered relationships, but the effort needed to use such an API. The typical operations we considered are

find all creators of "Requiem" (ignore order)
find all properties of "Requiem" (hide auxiliary objects like instances of Sequence)
find the first creator of "Requiem"
add a second (third etc.) creator

The schemes specialization and container are especially implementation-intensive. Specialization requires tracing the ordered versions of "creator" like "creator:1" for every access, whereas container entails checks to determine whether a single object or a bag is the destination of the link. In our comparison, we only considered ordered binary relationships, since ordered n-ary relationships are used very seldom and their semantics is typically hard to comprehend. Although ordering by reification looks fairly verbose in the figure, we found that it has a number of preferrable characteristics that make it a viable choice for a logical implementation of ordered relationships.

In some information and data models, ordering is built-in, i.e. it cannot be reduced to other modeling primitives like reification and binary relationships. Such models include UML, OEM, and XML. In other models like RDF and SHOE, ordering is not a built-in feature and can be implemented in various ways, similarly to the alternatives that we considered above. The choice of alternatives depends on the availability of modeling primitives. For example, since SHOE lacks reification, ordering by reification is out of the question.

4.5 N-ary Relationships

The last important feature that belongs to the object layer are n-ary relationships. Although n-ary relationships are used fairly seldom compared to binary relationships, they are supported by some Web-enabled information models like UML or SHOE. Hence, in general, Semantic Web applications should be prepared to deal with n-ary relationships.

Often, n-ary relationships are logically implemented on top of the four sublayers discussed above. Nevertheless, a clear definition of the semantics of n-ary relationships is crucial for interoperability between information models. n-ary relationships cannot be implemented as a combination of binary relationships without using additional objects. Thus, in the object layer, an n-ary link is typically represented as an object that is linked to n objects participating in the link (see Figure 6).

Figure 6: Example of a ternary link

Notice that the logical implementation depicted in the figure does not impose a specific implementation on the programming level. For example, the above ternary link can be implemented as a 3-tuple in a relational database. Using n-ary relationships, however, requires specifically designed API methods.

4.6 Summary of Object Layer Features

The five features of the object layer that we discussed above are summarized in Table [2]. The features that are implementable, but are not standardized in a particular information model are not counted. For example, order can be implemented in SHOE in many different ways using n-ary relationships, but every application may do it in a different way.

Some features in the table are marked as implicit. These features, like ordering in UML or n-ary relationships in SHOE, are visible on the API level only. They are not directly represented in the object model.

Feature	RDF	UML	SHOE	OEM	OIL
object identity and binary relationships	`+`	`+`	`+`	`+`	`+`
basic typing	`+`	`+` (implicit)	`+` (implicit)	`+`	`+`
reification	`+`	`+`	`0`	`0`	`0`
ordering	`0`*	`+` (implicit)	`0`	`0`*	`0`
n-ary relationships	`0`	`+`	`+` (implicit)	`0`	`0`

Table 2: Object layer features in RDF, UML, SHOE, OEM and OIL

*: RDF containers and OEM lists do not carry the semantics of ordered relationships

5 Implementation of the Object Layer

In the previous section we discussed the logical implementation of the object layer. This section deals with the programming-level realization. The logical design of the object layer largely determines both the APIs and the exchange syntax (i.e. mapping to the syntax layer) used for implementing the object layer. In particular, the APIs need to consider both navigational and declarative access to the objects. The navigational access is of primary relevance for in-memory implementations, whereas declarative access is important for database support.

Navigational Access

Traditionally, object-oriented models deploy APIs that are tailored for the object model of a given application. The objects are represented by instances of programming language objects. The properties or links between objects are accessed using member variables or get/set methods. Basic typing is provided by the typing system of the programming language. While perfect for a closed domain, such APIs are very inflexible. For example, it is usually not possible to add a property of a new kind to an object at runtime. Furthermore, inheritance schemes are fixed (e.g. single inheritance) and cannot be chosen by developers. Tailored APIs usually do not support reification of binary links. Generic APIs like [Mel99] provide the necessary flexibility. However, they are not as compact and intuitive, increasing development time and maintenance costs. If an API provides a class like "Link", reification comes usually for free, since instances of "Link" can be used without worrying about their identity. "For free" means that no additional objects and arcs as shown in Figure [3] need to be created. When such lightweight reification is possible, a convenient strategy for implementing ordered relationships is order by reification. n-ary relationships can be realized in a straightforward fashion mirroring the logical implementation described in the Section [4.5].

Declarative Access

Relational or object-oriented databases usually offer a declarative query language. An important consideration for implementing the object layer on top of a database is how well the querying capabilities of the database can be exploited. A clever implementation would be able to translate many kinds of declarative access operations into a single database query. In such cases, the query optimizer of the database system can be used effectively. The tradeoff between tailored and generic representations applies for databases similarly as for APIs. For example, if associations are represented as separate relational tables, the capability of reification of associations is lost, and no queries with variable associations are possible.

To illustrate the importance of the design of the object layer, consider the following implementation of ordering using a relational DBMS. In this implementation, a single table tuples holds binary links between objects in a generic fashion. The table contains four fields, all of the same type, that represent object identifiers (Object identifiers in a database system are typically implemented as integers. In the examples below we are using stylizised string values). The implementation uses order by reification. A sample content of the database is shown below.

    ID   S         P        O
    --------------------------------
    id1 Requiem   creator Salieri
    id2 Requiem   creator Mozart
    id3 id1       order    2
    id4 id2       order    1
    id5 Pinocchio creator Geppetto

The table contains two ordered links and one unordered link. The field ID contains identifiers of reified links. All find-queries listed in Section [4.4] as implementation criteria can be executed using a single SQL query. The most sophisticated query of these is retrieving the first creator. The complicating factor is that some creators are unordered. Still, retrieving the first creator for an object like Requiem can be done using the following single query:

    SELECT s1.S, s1.O
    FROM   tuples AS t1 LEFT JOIN tuples AS t2 ON t1.ID=t2.S
    WHERE t1.S=Requiem AND
           t1.P=creator AND
          (t2.P IS NULL OR t2.P=order) AND
          (t2.O IS NULL OR t2.O=1)
    GROUP BY s1.S

The GROUP BY clause is required to reduce the number of multiple unordered creators to one. The first creators of all objects can be retrieved by dropping the first conjunct in the WHERE clause. The result of the query would be:

(Requiem, Mozart)
(Pinocchio, Geppetto)

Mapping to the Syntax Layer

The mapping to the syntax layer can be optimized to support the features provided by the object layer. As an example, consider how serialization of reification and order can be implemented in a compact way. Order by reification can be expressed as

<tuple ID="id1" S="Requiem" P="creator" O="Mozart"/>
<tuple SID="id1" P="order" O="1"/>

The XML attribute SID in the second tuple is a reference to an ID attribute declared in the first tuple. For even more compact representation, a specialized ordering syntax can be used. Thus, the fact that Salieri is the second creator can be serialized as:

<tuple S="Requiem" P="creator" O="Salieri" order="2"/>

6 Motivating Example Revisited

Recall our motivating example presented in Section [2]. Given a common object layer, the relationship between food and edible stuff can be represented in UML and RDF as shown in Figure [7].

Figure 7: "food is edible stuff" at the object layer

Instead of using complex XSLT transformation, the translation from UML to the RDF representation can be achieved using a rule like

(X subClassOf Y) <= (Z type Generalization), (Z child X), (Z parent Y)

7 Related Work and Conclusion

[Bra79] presented a layered approach to Knowledge Representation Systems with the following layers:

Implementational: The level of data structures such as atoms, pointeres, lists and other programming notions.
Logical: Symbolic logic with its propositions, predicates, variables, quantifiers, and Boolean operators.
Epistomological: A level for defining concept types with substypes, inheritance, and structuring relations.
Conceptual: The level of semantic relations, linguistic roles, objects, and actions.
Lingusitic: The level of arbitrary concepts, words and expressions of natural languages.

Brachmans levels are different from our model: entities from our different layers are spread in Brachman's model. For instance, the notion of object identity is introduced in the conceptual layer in [Bra79], but at the object layer our model. Different logics are defined at the semantic layer in our model, but in the logical layer in Brachman's model. Our model captures the structures common to most knowledge representation and data modeling languages, and defines a common basis for interoperability, whereas Brachman's model is aiming towards Knowledge Representation Implementation.

In our approach to structuring the Information Model Interoperability (IMI) reference model we are building on the analogy with the Open Systems Interconnection (OSI) reference model used in computer networks [Tan97]. One of the major contributions of OSI is to provide a clear distinction between services, interfaces and protocols used in internetworking, enabling a stack of services on top of the more basis levels.

Our analysis of the selected information models in this paper suggests that a comprehensive object layer is yet to be defined. RDF and OEM [PGMW95], which are completely contained within the object layer, lack support n-ary relationships. SHOE and RDF are lacking standard order semantics. Although UML supports most of the features that we discussed, it prohibits coexisting objects from different UML metalayers, i.e. it is not possible, for example, to establish a relationship between a class and an instance of this class.

We believe that achieving semantic interoperability on the Web using a single monolithic information model defining the capabilities of all layers is irrealistic. Multiple information models need to coexist and cooperate. In this respect, interoperability results like those described in [HeH00] are to be taken with a grain of salt, since they are based on the assumption that all agents and knowledge providers are using SHOE. SHOE is aimed at a specific knowledge representation task, and other KR systems, e.g. for describing dynamic or probabilistic information, will be definitely needed.

In this paper we make the following three contributions. First, we analyse some information models and suggest a layered reference model for Information Model Interoperability. The reference model allows reducing the complexity of achieving interoperability between information models on the Semantic Web. We identify the object layer and examine its features in detail. Finally, we discuss issues involved in implementation of the object layer. We believe that an agreement on a common object layer can be an important step toward the realization of the Semantic Web.

References

BKD+00	Jeen Broekstra, Michel Klein, Stefan Decker, Dieter Fensel, and Ian Horrocks. Adding formal semantics to the Web: building on top of RDF Schema. Technical Report: Free University of Amsterdam, 2000, http://www.ontoknowledge.org/oil/extending-rdfs.pdf
Bor85	A. Borgida: Features Of Languages For The Development Of Information Systems At The Conceptual Level. IEEE Software, January 1985 ftp://ftp.cs.rutgers.edu/pub/borgida/CML-features.ps.gz
Bra79	Ronald J. Brachman, On the Epistomological Status of Semantic Networks. In: Findler, Nicholas V. (Ed., 1979). Associative Networks. Representation and Use of Knowledge by Computers. New York: Academic Press, 1979:3-50.
BrG00	Dan Brickley and R.V. Guha (eds). Resource Description Framework (RDF) Schema Specification 1.0, 2000. W3C Candidate Recommendation, 2000 http://www.w3.org/TR/2000/CR-rdf-schema-20000327/
Cat91	R. G. G. Cattell: Object Data Management. Addison-Wesley, 1991
DSS93	R. Davis, H. Shrobe, and P. Szolovits. What is a Knowledge Representation? AI Magazine, 14(1):17-33, 1993 http://www.medg.lcs.mit.edu/ftp/psz/k-rep.html
FHH+00	D. Fensel, I. Horrocks, F. Van Harmelen, S. Decker, M. Erdmann, and M. Klein. OIL in a Nutshell In: Knowledge Acquisition, Modeling, and Management, Proceedings of the European Knowledge Acquisition Conference (EKAW-2000), R. Dieng et al. (eds.), Lecture Notes in Artificial Intelligence, LNAI, Springer-Verlag, October 2000. http://www.cs.vu.nl/~ontoknow/oil/downl/oilnutshell.pdf
GeK94	M. R. Genesereth and S.P. Ketchpel: Software Agents In: Communications of the ACM 37(7, July), 48-53, 1994
GHW99	R. Goldman, J. McHugh, and J. Widom. From Semistructured Data to XML: Migrating the Lore Data Model and Query Language. WebDB Workshop, 1999 http://dbpubs.stanford.edu/pub/1999-53
HHL99	J. Heflin, J. Hendler, and S. Luke. SHOE: A Knowledge Representation Language for Internet Applications.Technical Report CS-TR-4078 (UMIACS TR-99-71), 1999 http://www.cs.umd.edu/projects/plus/SHOE/pubs/#tr99
HeH00	Jeff Heflin and James Hendler. Semantic Interoperability on the Web.In Proceedings of Extreme Markup Languages 2000. 2000 http://www.cs.umd.edu/projects/plus/SHOE/pubs/#extreme00
HFB+00	I. Horrocks, D. Fensel, J. Broekstra, S. Decker, M. Erdmann, C. Goble, F. van Harmelen, M. Klein, S. Staab, R. Studer, and E. Motta The Ontology Inference Layer OIL, Technical Report, Free University of Asterdam, 2000, http://www.cs.vu.nl/~dieter/oil/Tr/oil.pdf
LaS99	Ora Lassila and Ralph R. Swick (eds). Resource Description Framework (RDF) Model and Syntax Specification. W3C Recommendation, 1999 http://www.w3.org/TR/REC-rdf-syntax/
Mel99	S. Melnik: An API for RDF, 1999 http://www-db.stanford.edu/~melnik/rdf/api.html
Mel00	S. Melnik: Representing UML in RDF, 2000 http://www-db.stanford.edu/~melnik/rdf/uml/
PGMW95	Y. Papakonstantinou, H. Garcia-Molina, J. Widom: Object Exchange Across Heterogeneous Information Sources Proc. Int. Conf. on Data Engineering (ICDE), 1995 http://dbpubs.stanford.edu/pub/1995-6
RDF	W3C: Resource Description Framework http://www.w3.org/RDF/
SaA99	A. Sahuguet and Fabien Azavant. Building light-weight wrappers for legacy Web data-sources using W4FIn: International Conference on Very Large Databases (VLDB99) 1999.
Smi96	Brian C. Smith: On the Origin of Objects. MIT Press, 1996
Sow00	John F. Sowa. Ontology, Metadata, and Semiotics. Proc. Int. Conf. on Conceptual Structures (ICCS), Aug 2000 http://www.bestweb.net/~sowa/peirce/ontometa.htm
Tan97	Andrew. S. Tanenbaum. Computer Networks, Prentice-Hall, 3rd ed., 1997
XSLT	W3C: XSL Transformations (XSLT), W3C Recommendation, 1999 http://www.w3.org/TR/xslt

A Layered Approach to Information Modeling and Interoperability on the Web