Representing Order in RDF

Database Group, Stanford University
{melnik,stefan}@db.stanford.edu

Revision: Jan 7, 2001

Abstract

Representing order is a frequently appearing requirement in Knowledge Representation. In the paper we examine several possibilities for representing order in RDF, suggest criteria for evaluation and evaluate the different possibilities.

1 Introduction

Frequently it is necessary to represent order in a Knowledge Representation formalism. As a pure graph oriented formalism, RDF offers no predefined possibility (besides the disputed Bag mechanism) to express order. In the remainder of this article we present several possibilities to express order in RDF.

We distinguish between two different aspects: expressing order in the RDF data model, and XML-based supporting syntax for expressing order. Both two aspects are orthogonal to each other.

2 Representing Order in the RDF Data Model

To illustrate ordered relationships, consider the DublinCore association "creator". In Pushkin's poem "Mozart and Salieri", which inspired the movie "Amadeus", both Mozart and Salieri were presented as the "creators" of "Requiem". No doubt, when modeling the authorship of "Requiem", we want Mozart to be the primary author.

In the following we illustrate different modeling alternatives for expressing order. The seven alternatives are named sequence container, list container, linked list, specialization, ordinal properties, ternary, and reification, according to their logical implementation. Notice that although any representation can be bijectively translated into every other one, they are more or less semantically faithful. For example, the representations [1] and [2] (container) are particularly semantically misleading for representing ordered relationships since it states that the creator of "Requiem" is an object typed as Sequence.

Figure 1: Order by Sequence Container

Figure [1] illustrates representing order by using a container of type sequence. The value of the creator property is a container. Ordinal properties are pointing "Mozart" and "Salieri" respectively. This solutions is similar to the "Bag" construct described in the RDF specification.

Figure 2: Order by List Container

The option depicted in Figure [2] is similar to the Container definition presented in Figure [1] : LISP-like list constructors are used to implement order.

Figure 3: Order by Linked List

Figure [3] depicts another type of linked list: order is again implemented by a list construct, but the property "creator" is not directly used as an arc, but rather as a type of another node representing the binary relationship. The "next" predicate realized the order.

Figure 4: Order by Property Specialization

In Figure [4] order is realized by introducing a specialization mechanism for properties: creator:1 and creator:2 are specific sub-properties of creator.

Figure 5: Order by Ordinal Properties

Figure [5] implements order in a similar way to Figure [1] using ordinal properties. The difference is, that no bag or container is used, instead the node in the middle is interpreted as the relationship.

Figure 6: Order by Ternary Relationships

Figure [6] implements order by introducing additional destination/order arcs. In contrast to the graph depicted in Figure [5] order is expressed in the nodes, and not in the arcs.

Figure 7: Order by Reification and Arrays

Figure [7] implements order by reifying the statements. The resulting graph is in effect similar to the one depicted in Figure [6], but supported by higher level processing APIs.

Figure 8: Order by Reification and Lists

Figure [8] implements also order by reifying the statements, but the the order is again defined by a linked list of reifyed statements.

2 Evaluation

A qualitative comparison of the alternatives is presented in Table [1]. Besides semantic faithfulness, we consider how difficult it is to use the same logical schema for representing the inverse order. Inverse order is required when the objects at the source end that are related to a single object at the destination end have an ordering that must be preserved. For example, if the "creator" association were to capture the chronological order of the pieces written by the composers, representation for the inverse order would be needed. We gave a minus (-) to the schemes that required creation of additional reified objects for links or associations to support inverse order.

Finally, the last metric that we consider here is the implementation effort. By implementation effort we mean not the effort needed to implement the API that allows manipulating ordered relationships, but the effort needed to use such an API. The typical operations we considered are

find all creators of "Requiem" (ignore order)
find all properties of "Requiem" (hide auxiliary objects like instances of Sequence)
find the first creator of "Requiem"
add/insert a second (third etc.) creator

Please note, that the typical features of the datastructures arrays (fast access, but costly addition of elements) and linked lists (costly access, but fast addition of elements) are also present in the above order implementations. Ordinal properties correspond to arrays, linked statements or resources correspond to linked lists. Therefore any API handling these datastructures is limited by the underlying datastructures of the representation.

Alternative	Semantic faithfulness	Inverse order	Implementation effort
container sequence	`--`	`-`	`--`
container list	`--`	`-`	`--`
specialization	`++`	`-`	`--`
ordinal properties	`+-`	`-`	`-`
linked list	`+-`	`+`	`+`
ternary	`+-`	`+`	`+-`
reification (array)	`+`	`+`	`+-`
reification (list)	`+`	`+`	`+-`

Table 1: Logical implementation alternatives for ordering

The schemes specialization and container are especially implementation-intensive. Specialization requires tracing the ordered versions of "creator" like "creator:1" for every access, whereas container entails checks to determine whether a single object or a bag is the destination of the link. In our comparison, we only considered ordered binary relationships. Although ordering by reification looks fairly verbose in the figure, we found that it has a number of preferable characteristics that make it a viable choice for a logical implementation of ordered relationships. Both reification version (array and linked list) might be suitable, depending on the application.

Database Access

As an example of how the reification mechanism is used we now investigate the implementation of ordering using a relational DBMS. In this implementation, a single table tuples holds binary links between objects in a generic fashion. The table contains four fields that represent object identifiers, all fields are of the same type (Object identifiers in a database system are typically implemented as integers. In the examples below we are using stylized string values). The implementation uses order by reification (array). A sample content of the database is shown below.

    ID   S         P        O
    --------------------------------
    id1 Requiem   creator Salieri
    id2 Requiem   creator Mozart
    id3 id1       order    2
    id4 id2       order    1
    id5 Pinocchio creator Geppetto

The table contains two ordered links and one unordered link. The field ID contains identifiers of reified links. All find-queries listed in section 2 (Evalution) as implementation criteria can be executed using a single SQL query.
As an example we look the most sophisticated query: retrieving the first creator. The complicating factor is that some creators are unordered. Still, retrieving the first creator for an object like Requiem can be done using the following single query:

    SELECT t1.S, t1.O
    FROM   tuples AS t1 LEFT JOIN tuples AS t2 ON t1.ID=t2.S
    WHERE t1.S=Requiem AND
           t1.P=creator AND
          (t2.P IS NULL OR t2.P=order) AND
          (t2.O IS NULL OR t2.O=1)
    GROUP BY t1.S

The GROUP BY clause is required to reduce the number of multiple unordered creators to one. The first creators of all objects can be retrieved by dropping the first conjunct in the WHERE clause. The result of the query would be:

(Requiem, Mozart)
(Pinocchio, Geppetto)

4 Mapping to the Syntax Layer

The mapping to the syntax layer can be optimized to support the features provided by the object layer. As an example, consider how serialization of reification and order can be implemented in a compact way. Order by reification can be expressed as

<tuple ID="id1" S="Requiem" P="creator" O="Mozart"/>
<tuple SID="id1" P="order" O="1"/>

The XML attribute SID in the second tuple is a reference to an ID attribute declared in the first tuple. For even more compact representation, a specialized ordering syntax can be used. Thus, the fact that Salieri is the second creator can be serialized as:

<tuple S="Requiem" P="creator" O="Salieri" order="2"/>

However, often a representation allowing the definition several values of the relationship might be necessary. If this is the case, the following syntax (already present in the DAML specification) would allow this:

<rdf:Description rdf:about="#Requiem">
 <creator parseType="daml:orderedRelation">
  <item>Mozart</item>
  <item>Salieri</item>
 </creator>
</rdf:Description>

Please note that an RDF parser can easily produce the graph presented in Figure [7] from the RDF-XML given above.

5 Conclusion

In this draft paper we make the following contributions: we list several possibilities to implement order in RDF. We sketch a metric for evaluation of the different possibilities and give syntactic means to describe the ordered relations.