DAML Homework Assignment1: DAML Home Pages

DAML (OntoAgents) Homework Assignment 2:
DAML Queries/Life Cycle

For formulating queries we use a query language variant of Frame-Logic currently under development at Stanford DB and Karlsruhe AIFB. The language is not only suitable for evaluating queries, but also for the formulation of axioms. The query language is a simplified variant of Frame-Logic (F-Logic) with special support of RDF features (e.g. namespace declarations). We assume the following namespace definitions thoughout the the queries.

<namespace-declarations>
rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
rdfs="http://www.w3.org/TR/1999/PR-rdf-schema-19990303#"
sw="http://www.semanticweb.org/ontologies/swrc-onto-2000-09-10.daml#"
daml="http://www.daml.org/2000/10/daml-ont.daml#"
</namespace-declarations>

Give me all publications of the researcher with the last name "Studer".

FORALL Pub <- EXISTS ResID ResID[
rdf:type->sw:FullProfessor;
sw:lastname->"Studer";
sw:publications->Pub].
The former query just works with one (not specified Input-) RDF model and returns variable substitution. Often it is convenient to specify exactly on what RDF models the query is supposed to work (a database or the RDF store might contain several RDF models). Please note that an RDF model is not the same as an RDF namespace - there is not a one to one relationship between RDF models and namespaces. Also it is often convenient if the the result of an RDF query is also an RDF model. This model can again be stored in a database or used by the tools able to use RDF models. In the following we modify the former mentioned query to fullfill the requirements:

FORALL Pub,ResID Result(ResID[sw:publications->Pub])<-
ResID[rdf:type->sw:FullProfessor;
sw:lastname->"Studer";
sw:publications->Pub].
uses (AIFBModel union StanfordDBModel).
Which researchers are cooperating with other researchers?
FORALL X, Y<-
X[rdf:type->sw:AcademicStaff; sw:cooperateWith->Y[rdf:type->sw:AcademicStaff]].
On which projects do the phd students work that are supervised by a professor whith the email-adress "studer@aifb.uni-karlsruhe.de" ?

FORALL ProjID <- EXISTS PhdID, ResID
PhdID[rdf:type->sw:PhdStudent;sw:worksAtProject->ProjID;sw:supervisor->ResID]
and ResID[rdf:type->sw:FullProfessor;sw:email->"studer@aifb.uni-karlsruhe.de"].
Give me the organization that finances a project that deals with the research topic "ontology articulation" as well as all the person in this project that work in that topic?

FORALL OrgID, ProjID, MemID <-
OrgID[rdf:type->sw:Organization;sw:finances->ProjID]
and ProjID[rdf:type->sw:Project;sw:isAbout->"Ontology Articulation"; sw:member->MemID].
Find me the name of any project that has members with the homepage 'http://www-db.stanford.edu/~stefan/' and and tell me who they work with

FORALL ProjID,MemID,OrgID<-
ProjID[rdf:type->sw:Project; sw:member->MemID]
and MemID[rdf:type->sw:AcademicStaff; sw:homepage->"http://www-db.stanford.edu/~stefan";affiliation->OrgID].

3. Task: Describe how you would expect these queries to be implemented. Identify the major DAML software components and sketch the control and data flow among them. Your solution may address some or all of the following topics query language, dynamic retrieval, crawling, cacheing, translation, inference, scalability, consistency, security). Consider how this could be accomplished if some of the DAML content was sensitive information stored on multiple WWW sites protected by passwords and/or certificates. You may or may not have access to all of the data.

The overall agent infrastructure requires an information food chain: every part of the food chain provides information, which enables the existence of the next part. The food chain starts with the construction of an ontology, preferably with the OnTo-Agents Ontology Construction Tool. The ontology defines the terms that are possible to use for annotation information in webpages, using the DAML language. The proposed OnTo-Agents Webpage Annotation Tool has means to browse the ontology and to select appropriate terms of the ontology map to mark-up sections of a webpage. The webpage annotation process creates a set of annotated webpages, which are available to an OnTo-Agent to achieve its tasks. The OnTo-Agent itself needs several sub-components, specifically the OnTo-Agents Inference System for the evaluation of rules and queries and general inferences, the OnTo-Agents Ontology Articulation Toolkit for mediation among information obtained from different ontologies. The data in from the annotations can be used to construct additional websites: a Community Web Portal, that presents a community of interest to the outside word in a concise manner. And finally, information-seeking users can give specific retrieval tasks to an OnTo-Agent, or they can query a Community Web Portal for immediate access to the information.

The query processor itself needs to be scalable and to deal with millions, maybe billions of simple statements. Database technology provides this scalable infrastructure, but not for free: query optimizations have to be analyzed and implemented. Scalable retrieval technology should be based on well investigated deductive database technology, which provides special optimization strategies for typical queries. On top of the database technology it is necessary to implement a query language, that allows graph navigation in large RDF graphs and that supports special RDF features. The tradeoff between caching of data and retrieving and query time remains to be investigated. Especially the semantics of query answering with retrieval at runtime remains largely open.

4. Task: Extra Credit: Develop software to implement any or all of 3
W e have developed an RDF/DAML-Crawler

The specialized query and transformation language for RDF and DAML is under development. The inference core will be based on 1) the Java core of SiLRI and 2) XSB

5. tool wishlist

A JAVA based portable and lightweight implementation of a Description Logic suitable to do classification of DAML ontologies.

8. discussion of lesson learnd and insights

The need for DAML based inference should be largely eliminated at run-time, e.g. by precomputing the classification of the class hierarchy. Since we usually have to deal with millions or billions of triples, inferences should be reduced to a minimum, and inference algorithm have to be carefully selected and assembled according to their complexity.
A query language should support the features and properties of RDF: e.g. namespaces, graph navigation. The usage of query languages that don't support RDF features is very cumbersome (the usage of SQL for expressing RDF queries is an example. While possible in principle, queries are very hard to read and to understand.
DAML and OIL are candidates for Semantic Web languages among many others (topicmaps is just another candidate). Since the RDF model is universal applicable to every semantic web language, the query and inference tools should not have a fixed build in semantics. Instead they should allow for 'semantic modules' which deliver a 'plug-in' of the desirable semantics. Using the technology a general RDF query framework can be created and reused, while semantic modules can be plugged in to deliver the desired behavior of the complete system and the specialized language.

By Stefan Decker, Siegfried Handschuh