CS 545 - Database Research Seminar, Spring 96/97

Developing and Exploring Scientific Databases with the OPM Data Management Tools


Victor M. Markowitz, I-Min A. Chen, Anthony Kosky, and Ernest Szeto

Information and Computing Sciences Division
Lawrence Berkeley National Laboratory, Berkeley, CA 94720
EMail: {VMMarkowitz, IAChen, Anthony_Kosky, ,E_Szeto}@lbl.gov

The Object-Protocol Model (OPM) data management tools provide facilities for rapid development, documentation, and exploration of scientific databases. The tools are based on OPM, an object data model that is similar to the ODMG standard, but has additional constructs for modeling scientific data. Databases designed using OPM can be implemented using commercial relational DBMSs, using OPM schema translation tools that generate complete DBMS database definitions from OPM schemas. OPM schemas can be also retrofitted on top of existing relational databases or files defined using the ASN.1 data exchange format. OPM schema publishing tools allow documenting OPM databases in a variety of formats and notations.

Tools for querying (native or retrofitted) OPM databases include a query language translator that interprets queries expressed in the ODMG compliant OPM query language (OPM-QL) and translates them into the languages supported by the underlying DBMS. A Web-based OPM query interface allows graphical construction of ad hoc OPM queries and can be used for generating Web query forms.

Multidatabase OPM tools have been developed as an extension of the core OPM toolkit, with support for: (1) assembling heterogeneous databases into an OPM based multidatabase system, while documenting their schemas and inter-database links; (2) processing ad-hoc multidatabase queries via uniform OPM interfaces; and (3) assisting scientists in specifying and interpreting multidatabase queries.

Several archival molecular biology databases have been designed and implemented using the OPM tools, including the Genome Database (GDB) and the Protein Data Bank (PDB), while other scientific databases, such as the Genome Sequence Database (GSDB), have been retrofitted with semantically enhanced OPM views. The OPM multidatabase tools are currently applied for constructing a molecular biology database federation and for providing a common interface on top of heteroegenous financial and project management databases.

Current OPM work includes further development of the OPM multidatabase tools, extending the OPM retrofitting tools to cover additional data models, and extending the OPM toolkit to support complex data types, such as DNA sequences and 3-dimensional crystallographic data, irrespective of the underlying DBMS facilities.