Overview
Trio is a new kind of database management system: one in which
data, uncertainty of the data, and data lineage
are all first-class citizens. Trio is based on an extended relational
model called ULDBs, and it supports a SQL-based query language
called TriQL.
A wide variety of application domains can benefit from using Trio, including but not limited to: scientific and sensor data management; data cleaning and integration; information extraction systems; and approximate and hypothetical query processing.
We have completed an initial working prototype of the Trio system,
available for download or use over the web as detailed below. The
short paper Trio-One:
Layering Uncertainty and Lineage on a Conventional DBMS is the
most recent overview paper. It provides a good idea of what we're up to,
although it's now somewhat out of date.
The Trio project is
supported in part by NSF grants IIS-0324431 and IIS-1098447, and by
grants from the Boeing and Hewlett-Packard Corporations.
Trio Online Demo
The Trio system is available for use over the web. You can create
your own databases or use our samples, and you can run TriQL queries
and browse the results.
Download Trio Source Code and Binaries
The Trio prototype is available as open-source code under the BSD license.
To install the Trio system at your own site, a package containing the source code and some precompiled binaries is available for download here.
The Trio client can be used via a convenient browser interface (TrioExplorer), through a command-line interface (trioplus), using direct API calls linked from another Python script, or as an external command-line call.
We are currently running Trio successfully under Linux, Mac OS X, and Win-32 (XP, Vista, and 32-bit Server). For more eccentric environments,
the required Trio binaries can be recompiled.
News
- [2007-08] Activities have been scaled back a
little during the 2007-08 academic year, especially system work, but we're
poised to ramp back up in fall '08.
- [June 2007] We've made a small but important change
to the ULDB data model on which Trio is based. See
An Update to the Trio Data Model for an explanation and justification of the change.
- [June 2007] Dbworld message announcing the open-source release of the Trio prototype.
- [January 2007] Dbworld message announcing the web release of the Trio online demo.
- [January 2007] A demonstration of the latest
Trio prototype was given at the CIDR 2007 conference.
Here is the 6-page paper that
appeared in the conference proceedings. This paper is the best current overview of the Trio project.
- [September 2006] On 9/22/06 several groups working in
the area of uncertain and probabilistic data management got together
at Stanford for an informal meeting. Here are some slides and notes from the meeting.
- [June 2006] A demonstration of the Trio prototype was
given at the VLDB 2006 conference.
Here is the 4-page demonstration description
that appeared in the conference proceedings, and here are some
photos from the demo session at the conference.
- [May 2006] A first draft of the TriQL Language Manual is complete,
although the language (and document) continue to evolve.
- [April 2006] A
news article on the Trio project (with a terrific photo) appeared in the March 22, 2006
Stanford Report.
Subsequently, Trio was featured in an April 20, 2006
PC World article.
Papers: Overviews and Demo Descriptions
In reverse chronological order of when they were written
- M. Mutsuzaki, M. Theobald, A. de Keijzer, J. Widom,
P. Agrawal, O. Benjelloun, A. Das Sarma, R. Murthy, and T. Sugihara.
Trio-One: Layering
Uncertainty and Lineage on a Conventional DBMS. Proc.
Third Biennial Conference on Innovative Data Systems Research (CIDR '07),
Pacific Grove, California, January 2007. Demonstration description.
- P. Agrawal, O. Benjelloun, A. Das Sarma, C. Hayworth,
S. Nabar, T. Sugihara, and J. Widom.
Trio: A System for Data,
Uncertainty, and Lineage. Proc. 32nd Intl.
Conference on Very Large Data Bases, pages 1151-1154, Seoul, Korea,
September 2006. Demonstration description.
- O. Benjelloun, A. Das Sarma, C. Hayworth, and J. Widom.
An Introduction to ULDBs
and the Trio System. IEEE Data Engineering Bulletin, Special Issue
on Probabilistic Databases, 29(1):5-16, March 2006.
- J. Widom. Trio: A System for
Integrated Management of Data, Accuracy, and Lineage.
Proc. Second Biennial Conference on Innovative Data Systems Research
(CIDR '05), Pacific Grove, California, January 2005.
Papers: Technical Topics
In reverse chronological order of when they were written
- A. Das Sarma, P. Agrawal, S. Nabar, and J. Widom. Towards Special-Purpose
Indexes and Statistics for Uncertain Data. To appear in
Proc. 2008 Workshop on Management of Uncertain Data,
Auckland, New Zealand, August 2008.
- A. Das Sarma, M. Theobald, and J. Widom. Data Modifications and
Versioning in Trio. Technical Report, March 2008.
- A. Das Sarma, J.D. Ullman, and J. Widom. Schema Design for
Uncertain Databases. Technical Report, November 2007.
- R. Murthy and J. Widom. Making Aggregation Work
in Uncertain and Probabilistic Databases. Proc.
Workshop on Management of Uncertain Data, pages 76-90, Vienna,
Austria, September 2007.
- A. Das Sarma, M. Theobald, and J. Widom.
Exploiting Lineage for
Confidence Computation in Uncertain and Probabilistic Databases.
Proc. 24th Intl.
Conference on Data Engineering, Cancun, Mexico, April 2008
- P. Agrawal and J. Widom.
Confidence-Aware Joins
in Large Uncertain Databases. Technical Report, March 2007.
- O. Benjelloun, A. Das Sarma, A. Halevy, M. Theobald, and
J. Widom. Databases
with Uncertainty and Lineage. VLDB Journal, 17(2):243-264, March
2008. Note: much of the material in this paper appeared in
preliminary form in ULDBs: Databases with Uncertainty and
Lineage, cited next, and Trio-One: Layering Uncertainty and
Lineage on a Conventional DBMS, cited above.
- O. Benjelloun, A. Das Sarma, A. Halevy, and J. Widom.
ULDBs: Databases with
Uncertainty and Lineage. Proc.
32nd Intl. Conference on Very Large Data Bases, pages 953-964,
Seoul, Korea, September 2006.
- A. Das Sarma, S.U. Nabar, and J. Widom.
Representing Uncertain Data: Uniqueness, Equivalence, Minimization, and
Approximation. Technical Report, December 2005.
- A. Das Sarma, O. Benjelloun, A. Halevy, and J. Widom.
Working Models for Uncertain Data. Proc. 22nd Intl. Conference on Data Engineering,
Atlanta, Georgia, April 2006.
Talks
In reverse chronological order of when they were first given
- Trio: A System for Data, Uncertainty,
and Lineage (current overview talk, updated May '07)
Given by Jennifer at various venues, 2006-07
Slides in ppt or
pdf
- Representation Formalisms for Uncertain Data
Given by Jennifer at UW/Microsoft Summer Research Institute, Aug. 2005
Slides in ppt or
pdf
- Trio: A System for Integrated Management of Data, Accuracy,
and Lineage (original vision talk)
Given by Jennifer at various venues, 2004-05
Slides in ppt or
pdf
People
- Faculty
- Post-doc
- Graduate students
- Alums