Information Integration

Data Cubes

Constraint Checking

Surveys and Polemics

Dynamic Miss Counting, with Fujiwara and Motwani, ICDE, 2000. Techniques for detecting almost-contained columns in an incidence matrix (e.g., rows = baskets; columns = items in those baskets) where the density is very low, and the support threshold consequently low.

Dynamic Itemset Counting and Implication Rules for Market-Basket Data. In SIGMOD 1997, with Sergey Brin, Rajeev Motwani, and Dick Tsur. Shows how to estimate the maximal sets of items that have high support quickly, thereby reducing the number of passes needed to find all such sets (compared with the a-priori algorithm).

Query Flocks: A Generalization of Association-Rule Mining. With Dick Tsur, Chris Clifton, Serge Abitboul, Rajeev Motwani, Svetlozar Nestorov, and Arnie Rosenthal. In 1998 SIGMOD. Query flocks are parametrized queries coupled with a filter condition; their semantics is the set of values of the parameters that make the result of the query pass the filter test. This paper explores ways to execute the query about the parameters implied by the flock efficiently.

Computing Iceberg Queries Efficiently. With Min Fang, Narayanan Shivakumar, Hector Garcia-Molina, and Rajeev Motwani. In 1998 VLDB. Examines ways to combine hashing techniques to speed up market-basket analysis.

Scalable
Techniques for Mining Causal Structures.
With Sergey Brin, Rajeev Motwani, and Craig Silverstein.
In 1998 VLDB.
Bayes-nets techniques for inferring causality, e.g., ``buying diapers
causes you to buy beer,'' work only for very small numbers of variables,
such as items in a market basket.
This paper looks at ways to make *some* inferences of causality
by looking only at triples of items and making some stringent tests for
correlation and noncorrelation.

The
TSIMMIS Approach to Mediation: Data Models and Languages.
A survey of TSIMMIS in the NGITS Symposium, Naharia, Israel,
June, 1995;
many coauthors.
A more complete version appears in *J. Intelligent Information
Systems* **8**:2, pp. 117-132, March, 1997.

Querying Semistructured, Heterogeneous Information (with Dallan Quass, Anand Rajaraman, Shuky Sagiv, and Jennifer Widom). Presents the LOREL query language, its motivation, and semantics. Also, a A shorter Version that appeared in DOOD '95.

Medmaker: A Mediation System Based on Declarative Specifications. Appears in ICDE '96; with Yannis Papakonstantinou and Hector Garcia-Molina. Describes an "object logic" for generating mediators and its semantics. A shorter version.

A Query Translation Scheme for Rapid Implementation of Wrappers (with Yannis Papakonstantinou, Ashish Gupta, and Hector Garcia-Molina). Allows query templates to be a recursively defined family of queries. Uses the containment test for a conjunctive query in a datalog program to find a best match between a query and a template. A shorter version that appeared in DOOD '95.

Integrating Information by Outerjoins and Full Disjunctions (with Anand Rajaraman). Appears in 1996 PODS. The full disjunction of a collection of relations is the natural join of all the connected tuples, padded out with nulls. When can we compute the full disjunction by using the outerjoin operator on the relations in some order? The answer involves a venerable concept of Ron Fagin's called "gamma acyclicity." In addition to providing an interesting and useful algorithm for outerjoining, this paper reminds us of the often forgotten value of investigating natural concepts (like gamma-acyclicity) for their own sake.

Information integration Using Limited External Processors (with A. Y. Levy and A. Rajaraman). Solves the problem of whether a conjunctive query can be expressed in terms of views, where the (possibly infinite set of) views are those generated by the expansion of a Datalog program. Appears in 1996 PODS.

Representative Objects: Concise Representations of Semi-Structured Hierarchical Data (with Svetlozar Nestorov, Sudarshan Chawathe, and Janet Wiener; to appear in 1997 ICDE). A representative object tells us about the possible paths (label sequences) in OEM objects. We relate the construction of representative objects to finite automaton construction, and we give some simplifications that contain less information but are more compact.

Information Integration Using Logical Views Invited paper for ICDT '97. Surveys relevant database theory and compares the use of this theory in Information Manifold (AT&T Labs) and Tsimmis (Stanford).

Index
Selection for OLAP
(with Himanshu Gupta, Venky Harinarayan and Anand Rajaraman;
1997 ICDE).
This follow-on to the above paper looks at the problem of incorporating
indexes into the theory.
The 63% bound seems to be based on a *monotonicity* condition
that when we pick one view to materialize, it does not increase the
benefit of any other view.
Thinking of indexes on a view as another view, monotonicity is violated.
When you materialize the underlying view, the value of its indexes jump
up. In this paper we give several work-arounds, where we look at a view
and some of its indexes to materialize together, in ways that don't make
the selection process grow exponentially or high-degree polynomially.
We show bounds on performance compared with the optimum that approach
the 63% bound.

Efficient Implementation of Data Cubes Via Materialized Views A survey of the field for the 1996 KDD conference.

An NSF report on the future of database research (coedited with Avi Silberschatz, Mike Stonebraker).

A Comparison Between Deductive and Object-Oriented Database Systems from the 1991 DOOD Conference.

Database Logic With Negation. A survey of approaches to nonmonotonic reasoning from the database point of view, revised from the conference proceedings ``Computers as Our Better Partners,'' Tokyo, 1994.

A Survey of Research in Deductive Database Systems (with Raghu Ramakrishnan; appears in J. Logic Programming, May, 1995, pp. 125-149.

The Role of Theory Today Comments on theoretical CS submitted to a special issue of Computing Surveys.

The Database Approach to Knowledge Representation A brief survey of how the Datalog and Conjunctive-Query technology developed by the PODS community can be applied to problems in knowledge representation, especially information integration. Appears in 1996 AAAI.

Research Publication Modes Need to be Reengineered
A discussion from the July, 1996 *Computing Research News* of how
the web impacts publication and the tenure process.

Jeffrey D. Ullman

650-494-8016 (home)

650-725-2588 (FAX)