Jeff Ullman: Recent Papers


Data Mining
Information Integration
Data Cubes
Constraint Checking
Surveys and Polemics

Data Mining Papers

``Finding Interesting Associations without Support Pruning,'' with many coauthors, ICDE 2000. Signature schemes for detecting low-support, high-correlation pairs of items in a market basket.

Dynamic Miss Counting, with Fujiwara and Motwani, ICDE, 2000. Techniques for detecting almost-contained columns in an incidence matrix (e.g., rows = baskets; columns = items in those baskets) where the density is very low, and the support threshold consequently low.

Dynamic Itemset Counting and Implication Rules for Market-Basket Data. In SIGMOD 1997, with Sergey Brin, Rajeev Motwani, and Dick Tsur. Shows how to estimate the maximal sets of items that have high support quickly, thereby reducing the number of passes needed to find all such sets (compared with the a-priori algorithm).

Query Flocks: A Generalization of Association-Rule Mining. With Dick Tsur, Chris Clifton, Serge Abitboul, Rajeev Motwani, Svetlozar Nestorov, and Arnie Rosenthal. In 1998 SIGMOD. Query flocks are parametrized queries coupled with a filter condition; their semantics is the set of values of the parameters that make the result of the query pass the filter test. This paper explores ways to execute the query about the parameters implied by the flock efficiently.

Computing Iceberg Queries Efficiently. With Min Fang, Narayanan Shivakumar, Hector Garcia-Molina, and Rajeev Motwani. In 1998 VLDB. Examines ways to combine hashing techniques to speed up market-basket analysis.

Scalable Techniques for Mining Causal Structures. With Sergey Brin, Rajeev Motwani, and Craig Silverstein. In 1998 VLDB. Bayes-nets techniques for inferring causality, e.g., ``buying diapers causes you to buy beer,'' work only for very small numbers of variables, such as items in a market basket. This paper looks at ways to make some inferences of causality by looking only at triples of items and making some stringent tests for correlation and noncorrelation.

Information Integration Papers

Answering Queries Using Templates With Binding Patterns. In PODS 1995, with Anand Rajaraman and Shuky Sagiv. Shows how to construct queries from given query forms, i.e., views, that specify binding patterns.

The TSIMMIS Approach to Mediation: Data Models and Languages. A survey of TSIMMIS in the NGITS Symposium, Naharia, Israel, June, 1995; many coauthors. A more complete version appears in J. Intelligent Information Systems 8:2, pp. 117-132, March, 1997.

Querying Semistructured, Heterogeneous Information (with Dallan Quass, Anand Rajaraman, Shuky Sagiv, and Jennifer Widom). Presents the LOREL query language, its motivation, and semantics. Also, a A shorter Version that appeared in DOOD '95.

Medmaker: A Mediation System Based on Declarative Specifications. Appears in ICDE '96; with Yannis Papakonstantinou and Hector Garcia-Molina. Describes an "object logic" for generating mediators and its semantics. A shorter version.

A Query Translation Scheme for Rapid Implementation of Wrappers (with Yannis Papakonstantinou, Ashish Gupta, and Hector Garcia-Molina). Allows query templates to be a recursively defined family of queries. Uses the containment test for a conjunctive query in a datalog program to find a best match between a query and a template. A shorter version that appeared in DOOD '95.

Integrating Information by Outerjoins and Full Disjunctions (with Anand Rajaraman). Appears in 1996 PODS. The full disjunction of a collection of relations is the natural join of all the connected tuples, padded out with nulls. When can we compute the full disjunction by using the outerjoin operator on the relations in some order? The answer involves a venerable concept of Ron Fagin's called "gamma acyclicity." In addition to providing an interesting and useful algorithm for outerjoining, this paper reminds us of the often forgotten value of investigating natural concepts (like gamma-acyclicity) for their own sake.

Information integration Using Limited External Processors (with A. Y. Levy and A. Rajaraman). Solves the problem of whether a conjunctive query can be expressed in terms of views, where the (possibly infinite set of) views are those generated by the expansion of a Datalog program. Appears in 1996 PODS.

Representative Objects: Concise Representations of Semi-Structured Hierarchical Data (with Svetlozar Nestorov, Sudarshan Chawathe, and Janet Wiener; to appear in 1997 ICDE). A representative object tells us about the possible paths (label sequences) in OEM objects. We relate the construction of representative objects to finite automaton construction, and we give some simplifications that contain less information but are more compact.

Information Integration Using Logical Views Invited paper for ICDT '97. Surveys relevant database theory and compares the use of this theory in Information Manifold (AT&T Labs) and Tsimmis (Stanford).

Data Cube Papers

Implementing Data Cubes Efficiently (with Venky Harinarayan and Anand Rajaraman). Appears in 1996 SIGMOD; winner: best paper award. Data cubes and related decision-support systems invite us to consider the materialization of a selected subset of a very large number of views. We model the problem as a lattice of views and show that the greedy algorithm of selecting for materialization those views that, given what views we have already materialized, offer the most improvement in average response time is within 63% of optimal in all cases. In many realistic cases, we can show the difference between the greedy and optimal solutions is essentially nothing.

Index Selection for OLAP (with Himanshu Gupta, Venky Harinarayan and Anand Rajaraman; 1997 ICDE). This follow-on to the above paper looks at the problem of incorporating indexes into the theory. The 63% bound seems to be based on a monotonicity condition that when we pick one view to materialize, it does not increase the benefit of any other view. Thinking of indexes on a view as another view, monotonicity is violated. When you materialize the underlying view, the value of its indexes jump up. In this paper we give several work-arounds, where we look at a view and some of its indexes to materialize together, in ways that don't make the selection process grow exponentially or high-degree polynomially. We show bounds on performance compared with the optimum that approach the 63% bound.

Efficient Implementation of Data Cubes Via Materialized Views A survey of the field for the 1996 KDD conference.

Constraint Checking

Constraint Checking With Partial Information (PODS 1994 paper with coauthors Ashish Gupta, Shuky Sagiv, and Jennifer Widom).

Surveys and Polemics

Ordinary Skill in the Art, an essay about the problems with software patents.

An NSF report on the future of database research (coedited with Avi Silberschatz, Mike Stonebraker).

A Comparison Between Deductive and Object-Oriented Database Systems from the 1991 DOOD Conference.

Database Logic With Negation. A survey of approaches to nonmonotonic reasoning from the database point of view, revised from the conference proceedings ``Computers as Our Better Partners,'' Tokyo, 1994.

A Survey of Research in Deductive Database Systems (with Raghu Ramakrishnan; appears in J. Logic Programming, May, 1995, pp. 125-149.

The Role of Theory Today Comments on theoretical CS submitted to a special issue of Computing Surveys.

The Database Approach to Knowledge Representation A brief survey of how the Datalog and Conjunctive-Query technology developed by the PODS community can be applied to problems in knowledge representation, especially information integration. Appears in 1996 AAAI.

Research Publication Modes Need to be Reengineered A discussion from the July, 1996 Computing Research News of how the web impacts publication and the tenure process.

Jeffrey D. Ullman
ullman @
650-494-8016 (home)
650-725-2588 (FAX)