Database Group,
Stanford
University
{melnik,stefan}@db.stanford.edu
Representing order is a frequently appearing requirement in Knowledge Representation. In the paper we examine several possibilities for representing order in RDF, suggest criteria for evaluation and evaluate the different possibilities.
Frequently it is necessary to represent order in a Knowledge Representation formalism. As a pure graph oriented formalism, RDF offers no predefined possibility (besides the disputed Bag mechanism) to express order. In the remainder of this article we present several possibilities to express order in RDF.
We distinguish between two different aspects: expressing order in the RDF data model, and XML-based supporting syntax for expressing order. Both two aspects are orthogonal to each other.
To illustrate ordered relationships, consider the DublinCore association "creator". In Pushkin's poem "Mozart and Salieri", which inspired the movie "Amadeus", both Mozart and Salieri were presented as the "creators" of "Requiem". No doubt, when modeling the authorship of "Requiem", we want Mozart to be the primary author.
In the following we illustrate different modeling alternatives for expressing order. The seven alternatives are named sequence container, list container, linked list, specialization, ordinal properties, ternary, and reification, according to their logical implementation. Notice that although any representation can be bijectively translated into every other one, they are more or less semantically faithful. For example, the representations [1] and [2] (container) are particularly semantically misleading for representing ordered relationships since it states that the creator of "Requiem" is an object typed as Sequence.
Figure 1: Order by Sequence Container
Figure [1] illustrates representing order by using a container of type sequence. The value of the creator property is a container. Ordinal properties are pointing "Mozart" and "Salieri" respectively. This solutions is similar to the "Bag" construct described in the RDF specification.
Figure 2: Order by List Container
The option depicted in Figure [2] is similar to the Container definition presented in Figure [1] : LISP-like list constructors are used to implement order.
Figure 3: Order by Linked List
Figure [3] depicts another type of linked list: order is again implemented by a list construct, but the property "creator" is not directly used as an arc, but rather as a type of another node representing the binary relationship. The "next" predicate realized the order.
Figure 4: Order by Property Specialization
In Figure [4] order is realized by introducing a specialization mechanism for properties: creator:1 and creator:2 are specific sub-properties of creator.
Figure 5: Order by Ordinal Properties
Figure [5] implements order in a similar way to Figure [1] using ordinal properties. The difference is, that no bag or container is used, instead the node in the middle is interpreted as the relationship.
Figure 6: Order by Ternary Relationships
Figure [6] implements order by introducing additional destination/order arcs. In contrast to the graph depicted in Figure [5] order is expressed in the nodes, and not in the arcs.
Figure 7: Order by Reification and Arrays
Figure [7] implements order by reifying the statements. The resulting graph is in effect similar to the one depicted in Figure [6], but supported by higher level processing APIs.
Figure 8: Order by Reification and Lists
Figure [8] implements also order by reifying
the statements, but the the order is again defined by a linked list of reifyed
statements.
A qualitative comparison of the alternatives is presented in Table [1]. Besides semantic faithfulness, we consider how difficult it is to use the same logical schema for representing the inverse order. Inverse order is required when the objects at the source end that are related to a single object at the destination end have an ordering that must be preserved. For example, if the "creator" association were to capture the chronological order of the pieces written by the composers, representation for the inverse order would be needed. We gave a minus (-) to the schemes that required creation of additional reified objects for links or associations to support inverse order.
Finally, the last metric that we consider here is the implementation effort. By implementation effort we mean not the effort needed to implement the API that allows manipulating ordered relationships, but the effort needed to use such an API. The typical operations we considered are
Please note, that the typical features of the datastructures arrays (fast access, but costly addition of elements) and linked lists (costly access, but fast addition of elements) are also present in the above order implementations. Ordinal properties correspond to arrays, linked statements or resources correspond to linked lists. Therefore any API handling these datastructures is limited by the underlying datastructures of the representation.
Alternative | Semantic faithfulness | Inverse order | Implementation effort |
container sequence | -- | - | -- |
container list | -- | - | -- |
specialization | ++ | - | -- |
ordinal properties | +- | - | - |
linked list | +- | + | + |
ternary | +- | + | +- |
reification (array) | + | + | +- |
reification (list) | + | + | +- |
Table 1: Logical implementation alternatives for ordering
The schemes specialization and container are especially implementation-intensive. Specialization requires tracing the ordered versions of "creator" like "creator:1" for every access, whereas container entails checks to determine whether a single object or a bag is the destination of the link. In our comparison, we only considered ordered binary relationships. Although ordering by reification looks fairly verbose in the figure, we found that it has a number of preferable characteristics that make it a viable choice for a logical implementation of ordered relationships. Both reification version (array and linked list) might be suitable, depending on the application.
As an example of how the reification mechanism is used we now investigate the implementation of ordering using a relational DBMS. In this implementation, a single table tuples holds binary links between objects in a generic fashion. The table contains four fields that represent object identifiers, all fields are of the same type (Object identifiers in a database system are typically implemented as integers. In the examples below we are using stylized string values). The implementation uses order by reification (array). A sample content of the database is shown below.
ID S
P O
--------------------------------
id1 Requiem creator Salieri
id2 Requiem creator Mozart
id3 id1 order
2
id4 id2 order
1
id5 Pinocchio creator Geppetto
The table contains two ordered links and one unordered link. The field ID
contains identifiers of reified links. All find-queries listed in section
2 (Evalution) as implementation criteria can be executed using a single
SQL query.
As an example we look the most sophisticated query: retrieving the first creator.
The complicating factor is that some creators are unordered. Still, retrieving
the first creator for an object like Requiem can be done using the following
single query:
SELECT t1.S, t1.O
FROM tuples AS t1 LEFT JOIN tuples AS t2
ON t1.ID=t2.S
WHERE t1.S=Requiem AND
t1.P=creator
AND
(t2.P IS NULL OR
t2.P=order) AND
(t2.O IS NULL OR
t2.O=1)
GROUP BY t1.S
The GROUP BY clause is required to reduce the number of multiple unordered creators to one. The first creators of all objects can be retrieved by dropping the first conjunct in the WHERE clause. The result of the query would be:
(Requiem, Mozart)
(Pinocchio, Geppetto)
.
<tuple ID="id1" S="Requiem" P="creator" O="Mozart"/>
<tuple SID="id1" P="order" O="1"/>
The XML attribute SID in the second tuple is a reference to an ID attribute declared in the first tuple. For even more compact representation, a specialized ordering syntax can be used. Thus, the fact that Salieri is the second creator can be serialized as:
<tuple S="Requiem" P="creator" O="Salieri" order="2"/>
However, often a representation allowing the definition several values of the relationship might be necessary. If this is the case, the following syntax (already present in the DAML specification) would allow this:
<rdf:Description rdf:about="#Requiem"> <creator parseType="daml:orderedRelation"> <item>Mozart</item> <item>Salieri</item> </creator> </rdf:Description>
Please note that an RDF parser can easily produce the graph presented in Figure [7] from the RDF-XML given above.
In this draft paper we make the following contributions: we list several possibilities to implement order in RDF. We sketch a metric for evaluation of the different possibilities and give syntactic means to describe the ordered relations.