Proposed Updates of RDF
This document tries to capture some of the recent discussions on
the
rdf-interest
mailing list
Aspects of an updated syntax are discussed in Sergey Melnik's Simplified
Syntax for RDF-document.
Tracing RDF statements
Having the possibility of tracing the source of an RDF statement was
mentioned as an requirement in several postings (0087,
0089).
It is also in some proposals for storing RDF
data in relational databases, and in APIs, e.g. the original
RADIX proposal or Sergey
Melniks proposal.
This raised
the question, if the datamodel should be modified.However, it was argued
that the the source of a triple is just posing a statement about a statement,
so reification is enough. (also in 0088).
Several kind of encodings are possible e.g.
It was argued, that this approach would multiply
the number of triples if done naively.To avoid this it was
proposed that the model origin could be stored with the triples, but
for the application it should appear as a bag of triples in a read-only
state. This allows also to have a property arrow from each statement
to its source. However, this poses a requirement
for software realizing a query API , and should be standardized.
Next it is necessary to standardize the property, which is used to
ask for the source of an statement. Examination of the current RDF
M&S specification document give not such a property.
The RDF Schema Specification
desribes a property isDefinedBy which could be used for
this purposes. Its anticipated use is to identify the RDF-Schema where
a name is defined, which is not in conflict to the usage that we demand
here. So one possibility is to extend the meaning of isDefinedBy in
such a way, that for any resource the source URI is the range. There need
not be a single source for an RDF-statement. Indeed can a certain model
contain multiple source for the same statement.
This would be a (minimal) extension and has to be described by the
RDF
Schema Specification.
Linking to Resources
The discussion started with the following
problem. Given a snippet from a homepage, e.g.
<center><A name="myname">Stefan Decker</A></center>,
it was asked in which respect the following two RDF snippets are identicial:
-
<rdf:Description about="http://www.aifb.uni-karlsruhe.de/~sde">
<s:Creator>Stefan Decker</s:Creator>
</rdf:Description>
-
<rdf:Description about="http://www.aifb.uni-karlsruhe.de/~sde">
<s:Creator resource="http://www.aifb.uni-karlsruhe.de/~sde#myname"/>
</rdf:Description>
Backgound was the problem of making an existing metadata editor RDF-complient.
Metadata is created using an WYSIWYG-HTML-Editor, which allows the semantic
annotation of HTML pages. One simply marks the text and selects the class/attribute
from an ontology. Semantic markup is inserted into the HTML-text. However,
if the text is copied this creates a maintenance nightmare. This
is also true for any kind of resource, where the resource is in danger
of a frequent change. So this problem has a wider range.
One
answer saw the problem related to the issue of "Identifiers -
what is identified?" in Tims stawman
document. However, i think the problem described there is a bit different:
there the problem is to distinguish between the RDF (or XML) source and
the object, that is described in that RDF code. Another example is e.g.
the use of homepage-URIs as object identifier. If one make a statement
about that resource, does he mean the person that created the homepage
(the object in the real world) or the webresource? And how are they distinguished?
This problem was also identified in posting 0106.
However, the missing possibility to enforce a kind of dereferencing
was identified the cause of this problem.
Another
suggestion was to resolve this issue by attaching RDF-annotation
to SPAN elements. This would solve the problem for pointing
to HTML, but not for the extraction of metadata.
Three
possibilties were given for providing hints to dereferencing:
-
One can define additional syntax, but not change the RDF-model itself,
and define the model in such a way, that everything is as much dereferenced
as possible. Then the parser, which genererates the tripel, has to do the
work.However, parsing can be a time consuming activity.
-
Another way is to extend the RDF-model to make it possible to indicate,
that a particular URI should be dereferenced. By this the application can
decide, if it is necessary to dereference a URI
-
A third way would be to generate a new extra triple, that indicates that
the resource shhould be dereferenced. But this involves reifying the original
one and thus generates much more additional triple, and an application
has a hard job to do. However, this would not change the data model. But
it has to be standardized.
A
suggestion was, that Xpointer
would
provide a possibility for solving this issue and stressed, that Xpointer
should be a tool for RDF to provide fine-grained metadata Xpointer can
indeed be used to point to ranges and nodes, so this should be probably
adopted. However, i havn't found support for dereferencing (could somebody
verify this?).
Furthermore it was suggested, that there should be standardized
metadata extraction facilities for resources, distinguished by different
kind of links.
Something similar
was indeed discussed in the W3C RDF working group, as was pointed
out.
However, this indeed covered the inclusion of RDF metadata, and is
by this subsumed by the overal topic now (???). Also it was
warned, that there might be to many possibilities to extract metadata
out of web-resources , and that rdfs:seeAlso solves this issue.
The former point means, however, that we have to come up with a general
way to extract this metadata out of a resource and it is hard to see how
rdfs:seeAlso
defines such a possibility (see 0094,0099).
Another posting
pointed out, that not actually "dereferencing" is the problem, but metadata
extration, and this could be done by using the mime-type.
Conclusion:
What is needed is a metadata-extraction facility, that enables one
to extract metadata depending from the mime-type out of web-resources.
There is actually software that does exactly this. However, it is still
necessary to include this metadata in RDF-tripels. So some kind of dereferencing
is still necessary. This should be done by RDF-description of the resources
or the metadata extraction services itself (see 0106).
A system supporting this would indeed look very similar to the actual GINF
implementation: for each mime-type we would have an implementation somewhere
on the web. This implementation is given a piece of RDF specifying the
metadata, that should be extracted from a given resource. This again is
inserted into the RDF code. For a few standard mime-types (HTML, GIF, etc)
this should be quite easy to implement.
Clearly, this discussion should be acompanied by an example implementation,
otherwise there is the danger that it gets to abstract.
Missing Skolem-Function Definition
Posting 0092
identified
the missing of an important part in the RDF specification:
unique defined SKOLEM-Functions und ID-generators for RDF. A SKOLEM-Function
is a function that returns a unique defined value for its arguments. On
the first sight this topic seems to be not very important, but is gets
important as soon as RDF-models are exchanged and combined: if generated
IDs for reified tripel or unknown resources differ, it is not possible
determine, if these triples indeed mean the same. The ID of an reified
triple just depends on the original subj, pred, obj, thus these are the
parameters of the unique SKOLEM-Function.
Other Areas in Need of Clarification
Posting 0068
listed
some other well known questions, where clarification is needed:
-
aboutEachPrefix: handling aboutEachPrefix inside the model results
immediatly in an infinite model. This is clearly unactptable if the model
is handled as an extension, which is e.g. done by SiRPAC or GINF. There
are two possibilities to handle this:
-
To drop aboutEachPrefix from RDF.
-
To handle aboutEachPrefix as what is is: an intensional definition
(aka rule) ala
triple(Subject,Predicate, Object) <- aboutEachPrefixTriple(Prefix,Predicate,Object)
and startswith(Prefix,Subject).
Either way, the current status is clearly unacceptable.
-
xml:lang does not appear in the model either and is therefore also
a
bug in the specs. Either a new triple has to be appended to the model,
or xml:lang should be ignored.
-
There is no principle difference between rdf:ID and rdf:about. There
would be one if you appended rdf:isDefinedBy to every resource defined
by rdf:ID. Not in the model - no semantics.
Stefan Decker,
20-11-1999