Author: Sergey Melnik (melnik@db.stanford.edu) Revision: 2001-07-12 This is a crack at clarifying the role of literals as a part of the formal model. Below I tried to make the relationship with M&S explicit whenever possible, and to identify the new issues arising from this proposal. Notation A x B corresponds to the Cartesian product. In particular, I'd like to see whether the clarifications summarized below break something in M&S that is not already broken, or have subtle troubles that I failed to identify. Formal model: ------------ 1. There is a set of Unicode strings called Unicode*. 2. There is a set called Entities = Unicode* x Unicode*, i.e. an entity is a pair of Unicode strings. The first Unicode string of the pair is called "namespace prefix". 3. There is a subset of Entities called Resources. Each resource is a pair (URI prefix, local part). The namespace prefix of a resource is a URI. Resources correspond to the set of QNames as defined in http://www.w3.org/TR/REC-xml-names/. URI prefix may be empty, in which case a resource corresponds to an unqualified name. 4. There is a set called Literals = {"urn:data:literal:"} x Unicode*. 5. There is a subset of Resources called Properties. 6. There is a set called Triples = Entities x Entities x Entities. 7. There is a subset of Triples set called Statements = Resources x Properties x (Resources U Literals). 8. There is a subset of Resources called Namespaces. A namespace is a resource with an empty local name, i.e. (URI prefix, ""). 9. A subset of Statements is called an RDF Level 1-model. A subset of Entities is called an RDF Level 2-model. Observations: ------------ The motivation behind the above definitions is to preserve the intent captured in M&S. In particular, the sets Statemets, Resources, and Literals are presented as clarifications of the corresponding definitions in M&S. From the logical perspective, entities, resources, literals etc. are constants that can bear certain interpretation. ad 2) Two entities (a,b) and (c,d) are identical iff a=c and b=d as defined by canonical equivalence in Unicode standard. ad 3,4) Notice that Resources and Literals are overlapping. There are Literals that are not Resources, and there are Resources that are not Literals. In graph representation and XML syntax defined in M&S, Literals are abbreviated using quoted strings. Literals may be implemented in applications in distinct ways. ad 6) Not all entities can be serialized in current M&S syntax (only Statements) Impact on applications: ---------------------- Current RDF application can be declared as RDF Level 1. Level 1 has restrictions wrt to valid models that can be processed. Since only Level 1 models can be serialized in M&S RDF/XML, no unexpected breakages can creep in by receiving a Level 2 model from another application. RDF applications Level 2 drop several restrictions of Level 1. In particular, literals may be used as subjects (and properties?). A more flexible syntax is needed to be able to serialize any subset of entities (Level 2 model). Since namespaces are explicit, resource ("urn:isbn:","0123456789") is different from ("urn:","isbn:0123456789"). See comment on syntax below. Impact on M&S syntax: -------------------- Using of namespaces makes each resource to correspond to a QName. Generation of resources and namespaces could be done in amended M&S syntax as follows. (1) The parser records all explicit namespace declarations as namespace prefixes during parsing. (2) Each generated resource URI (as referred to in M&S) is assigned the most specific (longest) namespace prefix of those recorded in (1). (3) Resource URIs with undetermined prefixes are assigned prefix "" (unqualified). Impact on xml:lang: ------------------ xml:lang could be resolved either (1) by ignoring xml:lang in the M&S syntax altogether or, (2) by generating URI prefixes like "urn:data:lang:en:" etc. This would cause (urn:data:literal:, "foo") != (urn:data:lang:en:, "foo") != (urn:data:lang:fr:, "foo") However, APIs may provide access methods that ignore the differences. Anonymous resources and reification: ----------------------------------- These are separate issues that are orthogonal to the issue of literals/resources. Extensibility: ------------- Future RDF 2.0 may consider using namespace prefixes as a means of specifying primitive data types.