The New RUFUS Data Model

Dr. Peter Schwarz
IBM Almaden

RUFUS is a tool for finding and exploiting the information contained in semi-structured data, such as documents, mail, memos, etc. Unlike traditional database applications, for a which an appropriate schema can be designed in advance, RUFUS must contend with an evolving set of formats and datatypes. The New RUFUS Data Model (NRDM) is a flexible, object-oriented data model that accommodates change. Instead of an explicit type hierarchy, the relationship between types in NRDM is implicit, based on similarity of the types' interfaces. NRDM also supports change at the level of individual objects, by allowing a single object to support multiple interfaces. NRDM maintains strict separation between types and implementations, allowing one to define useful general-purpose types whose interfaces are supported by many implementations.

Both types and their implementations are described in the New RUFUS Schema Language (NRSL), an extension of C that features type checking and multiple inheritance. This talk will describe the features of both NRDM and NRSL, as well as our initial implementation of the NRDM runtime system.

Publications Related to Talk

"Managing Change in the Rufus System." In Proceedings of the Eleventh International Conference on Data Engineering, Taipei, Taiwan, March 1995. (with K. Shoens)

"The Rufus System: Information Organization for Semi-Structured Data." In Proceedings of the Nineteenth International Conference on Very Large Data Bases, August 1993. (with K. Shoens, A. Luniewski, J. Stamos, and J. Thomas)

Additional papers from the IBM Almaden database group can be obtained via
anonymous ftp
from host in directory /pub/cs/reports/database/.