Index | Back | Next |
<!ELEMENT genome ( gid, whose, data, contig*, feature*, pw*, role* )> |
This definition tells us the relationship between the element "genome" and the other elements, say "gid", "whose"... That is a "genome" has one "gid", "whose", "data", and one or more "contig", "feature", "pw" and "role".
ENTITY
with a direct access to the information under a structured format;
one could also ues the XLL one-to-many links with n > 2 if several
cross-references refer to the same biological object. This would avoid the
long process of cross-indexing the database entries at each release.
The biological data can be written in XML and stored in its native form (text format), which requires the database systems to manage these resouces effieciently. A database supporting XML is not realized. In this case how to query the stored biological data based on XML and the schema diffinition play an important role.
In order to use the biological information in a database, we need to exectue queries on it. Several query languages have been proposed to query XML documents, such as XQL, XML-QL and GQL. Although these languages offer similar capability in terms of data extraction, XML-QL is considered to be most expressive because XML-QL queries can constract new XML data from the results of queries. Since this data can be used as input to further queries, results can be refined through successive application of queries.
Here I'd like to introduce briefly the XQuery Language, which is still under developing by W3C, and is designed to be broadly applicable accross all types of XML data sources. XQuery is a functional language in which a query is represented as an expression. It supports several kinds of expression, and therefore its queries may take several different forms. The various forms of XQuery expressions can be nested with full generality, so the notion of a "subquery" is natural to XQuery. The input and output of a query are instances of a data model called the XML Query Data Model.
FLWR (pronounced "flower") expressions is one of the eight principle forms in XQuery. A FLWR expression is constructed from FOR, LET, WHERE, and RETURN clauses. As in an SQL query, these clauses must appear in a specific order. A FLWR expression binds values to one or more variables and then uses these variables to construct a result (in general, an ordered forest of nodes).
The first part of a FLWR expression consists of FOR-clauses and/or LET-clauses which serve to bind values to one or more variables. The values to be bound to the variables are represented by expressions.
A FOR-clause is used whenever iteration is needed. The FOR-clause introduces one or more variables, associating each variable with an expression that returns a list of nodes.
A LET-clause is also used to bind one or more variables to one or more
expressions. Unlike a FOR-clause, however, a LET-clause simply binds each
variable to the value of its respective expression without iteration,
resulting in a single binding for each variable. The difference between a
FOR-clause and a LET-clause can be illustrated by a simple example.
The
clause FOR $id IN /feature/fid
results in many bindings,
each of which binds the variable $id
to an identifier of
feature in the genome_set.xml
. On the other hand, the
clause LET $id := /feature/fid
results in a single binding
which binds the variable $id
to a list containing all the
idetifior of feature in the genome_set.xml
.
Each of the binding-tuples generated by the FOR and LET clauses is subject
to further filtering by an optional WHERE-clause. Only those tuples for which
the condition in the WHERE-clause is true are used to invoke the RETURN
clause. The WHERE-clause may contain several predicates, connected by AND, OR,
and NOT. These predicates usually contain references to the bound variables.
Variables bound by a FOR-clause represent a single node (with its descendants)
and so they are typically used in scalar predicates such as $d/date =
"2001Juni14"
. Variables bound by a LET-clause, on the other hand,
may represent lists of nodes, and can be used in list-oriented predicates such
as avg($p/price) > 100
. The ordering of the
binding-tuples generated by the FOR and LET clauses is preserved by the
WHERE-clause.
The RETURN-clause generates the output of the FLWR expression, which may be a node, an ordered forest of nodes, or a primitive value. The RETURN-clause is executed once for each tuple of bindings that is generated by the FOR and LET-clauses and satisfies the condition in the WHERE-clause. If an ordering exists among these tuples, the RETURN-clause is executed on each tuple, in order, and the order of the results is preserved in the output document. The RETURN-clause contains an expression that often contains element constructors, references to bound variables, and nested subexpressions.
The following example shows a FLWR query on the
genome_set.xml
FOR $ge IN (document("genome_set.xml")//genome) WHERE $ge/date = "2001Juni14" AND $ge/whose = "Example Genome Project" RETURN <genome> <gid>$ge/text()</gid> </genome> |
For more information about XQuery.
Index | Back | Next |