Program Schedule


Time   Speaker
Title
8:30am   Breakfast provided
9:00am   Vasilis Vassalos Opening Remarks
Database Theory I: 9:15am-10:30am
9:15am   Moshe Vardi A Call to Regularity
9:45am   Ron Fagin Acyclic Database Schemes
10:15am * Arthur Keller Reminiscences (Stanford)
10:30am   Coffee break
Database Theory II: 11:00am-12:15pm
11:00am   Dave Maier Relational Database Theory at Princeton and Beyond
11:20am   Alberto Mendelzon Who Won the Universal Relation Wars?
11:40am   Foto Afrati Rewritings for Answering Queries Using Views
12:00pm * HJ Siegel Reminiscences (Princeton)
12:15pm   Lunch provided
Algorithms: 2:00pm-2:55pm
2:00pm   Mihalis Yannakakis Reflections on Approximation, .... or How I (Almost) Finished My PhD Thesis
2:20pm   Kevin Karplus Predicting Protein Structure from Sequence
2:40pm * Svetlozar Nestorov Mining All About Jeff Ullman
2:55pm   Coffee break
XML Processing: 3:30pm-4:25pm
3:30pm   Shuky Sagiv Generating Relations from XML Documents
3:50pm   Yannis Papakonstantinou XML Mediators
4:10pm * Peter Chen Assessing the Impact of Jeff Ullman


(*) indicates a personal talk.


Abstracts of Technical Presentations



A Call to Regularity

Moshe Y. Vardi

Rice University

From the mid 1980s to the mid 1990s, influenced to a large extent by Jeff Ullman's seminal 1985 TODS paper on logical query languages, a major theme in database theory was the study of Datalog, the language of logic programs without function symbols. Unfortunately, most decision problems involving Datalog turned out to be undecidable. Furthermore, well-behaved fragments of Datalog turned out to be rather restrictive and unnatural. Surprisingly, a natural and quite general fragment of Datalog did emerge in the mid 1990s, in the context of semistructured data. This fragment is the class of regular queries, whose basic element is that of regular path query. In his talk I will describe the class of regular queries and its well-behavedness in the context of view-based query processing.



Acyclic Database Schemes

Ron Fagin

IBM Almaden

Database schemes (which, intuitively, are collections of table skeletons) can be viewed as hypergraphs. (A hypergraph is a generalization of an ordinary undirected graph, such that an edge need not contain exactly two nodes, but can instead contain an arbitrary nonzero number of nodes.) Unlike the situation for ordinary undirected graphs, there are several natural, nonequivalent notions of acyclicity for hypergraphs (and hence for database schemes). A large number of desirable properties of database schemes fall into a small number of equivalence classes, each completely characterized by the degree of acyclicity of the scheme. This talk is intended to be an informal introduction, in which the focus is mainly on the originally studied (and least restrictive) degree of acyclicity.



Relational Database Theory at Princeton and Beyond

David Maier

Oregon Graduate Institute

The arrival of Catriel Beeri at the EECS Department at Princeton in 1977 touched off a surge of interest in the relational database model in general and its formal properties. The majority of computer science PhD students at the time, many of them Jeff Ullman's students, began doing research in the area, an interest that continued for most of them after graduation. There continued to be a large amount of joint research among this group, along with a widening circuit of collaborators, after leaving Princeton and even after Jeff moved to Stanford. This group was the core for the XP series database theory workshops, which led in part to the initiation of PODS in 1982.

This talk traces the main research threads of this group at Princeton and after, including data dependency and schema design theory, implication and equivalence problems and procedures, universal instance query interfaces, acyclic database schemes, and logic databases. It presents the major research questions being pursued and gives an overview of the most significant results in the era.



Who Won the Universal Relation Wars?

Alberto Mendelzon

University of Toronto

Not long after Codd introduced the relational model and normalization theory, researchers such as Beeri and Bernstein observed the need to give semantics to functional dependencies, and other constraints, whose attributes do not necessarily appear together in any single relation. Expediency dictated the postulation of a "universal relation" that contains all attributes. Soon, other researchers, including prominently Ullman and his collaborators, showed how to exploit this paradigm to obtain logical independence, freeing the user from having to specify connection paths among attributes. That is, given a set of relation schemes, a set of constraints, and an arbitrary set of attributes X, not necessarily embedded in any single relation, the universal relation model can be used to define the set of tuples that a database instance associates with X.

Universal relation work led to some controversy, recorded in the literature by papers with titles such as "Consequences of Assuming a Universal Relation," "The Universal Relation Strikes Back," and "The Revenge of the JD." We will first review this controversy.

Over the past few years, much work has addressed an apparently unrelated question motivated by information integration. Given a global schema, and a collection of information sources whose content can be modeled by mappings from the virtual global database, what is the meaning of a query on the global schema, and how can it be computed? We will point out the (rather obvious) connections between this problem and query evaluation under the universal relation assumption, leading us to some (rather dubious) conclusions about who won the universal relation wars.



Rewritings for Answering Queries Using Views

Foto Afrati

National Technical University of Athens

The problem of answering queries using views arises in various application frameworks (information integration, incomplete information). The problem is that of answering a query on a database schema using only answers to views over the same schema. The concept of "certain answers" is generally adopted as the formalism for answers to the query that are acceptable in this setting. Certain answers have different definitions under the closed world assumption (the views contain all answers derived by evaluating them on a database) and open world assumption (the views contain arbitrary tuples). The data complexity of the problem of computing all certain answers is in general in coNP (for conjunctive queries with inequalities). The usual tool to find cases with polynomial data complexity is to find a maximally contained rewriting (MCR) in a polynomially computable language (e.g., conjunctive queries, datalog).

When considering MCRs, there are usually three major issues:

  1. Whether there is an MCR which computes all certain answers,
  2. Whether there is such an MCR in a polynomially computable language,
  3. How to compute this MCR given the query and view definitions

The problem of computing MCRs is NP-complete in the size of the query and view definitions. Given the importance of the problem however, various algorithms have been developed which use heuristics that save on computation time. We will present results on addressing those issues about MCRs in cases where queries and views are conjunctive queries with comparison predicates.



Reflections on Approximation, .... or How I (Almost) Finished My PhD Thesis

Mihalis Yannakakis

Stanford University

As in so many other areas, Jeff Ullman had a hand also in the early days of approximation algorithms (proving for instance some of the original results on the performance of bin packing algorithms). When I was starting in research as a graduate student, Jeff suggested to look into the approximation of hard optimization problems, and to try to develop a systematic theory tying the problems together, analogous to the NP-completeness theory for optimization problems. Fortunately, he did not hold me to that, and let me graduate. Many years were to pass until we got a better grasp on the nature of approximability. In some sense, as NP-completeness relates to the correctness of computations and proofs, approximability has something to do with their robustness. I will reflect on some of these developments, and comment on what else remains to finish my thesis.



Predicting Protein Structure from Sequence

Kevin Karplus

University of California, Santa Cruz

There are two ways to view proteins: as strings over a 20-letter alphabet and as three-dimensional arrangements of atoms. Somewhat surprisingly, the string contains all the information needed for a protein to fold up consistently to the same 3D shape. This introduces an interesting computational problem -- how can you go from the string to the 3D structure?

The approaches I've adopted at UCSC are a mix of several strategies including using hidden Markov models (HMMs) to find and align similar proteins, using neural nets to predict local structure properties, using the HMMs together with the local structure predictions to search libraries of known protein structures, and using genetic algorithms to assemble fragments and alignments into 3D structures for scoring with a variety of different cost functions.

The talk will also present the latest CASP results. CASP is an experiment done every two years in the field -- (almost) all the researchers in the field are given sequences (the input strings) for proteins whose structure is about to be determined experimentally. The researchers then have to register their predictions with the organizers of CASP. The predictions are compared with the experimental results, and methods are compared at a conference.



Generating Relations from XML Documents

Shuky Sagiv

Hebrew University

This paper discusses several mechanisms for creating relations out of XML documents. A relation generator consists of two parts: (1) a tuple of path expressions and (2) an index indicating which path expressions may not be assigned the null value. Evaluating a relation generator involves finding tuples of nodes that satisfy the path expressions and are related to one another in a meaningful fashion. Different semantics for evaluation are given that take into account the possible presence of incomplete information. The complexity of generating relations from documents is analyzed and evaluation algorithms are described.



Issues in XML Mediators

Yannis Papakonstantinou

University of California, San Diego

Business and scientific applications often require access, filtering, integration and transformation of the data of multiple distributed and heterogeneous information sources. Mediator systems address the need by providing query-able integrated views of the source data. We discuss mediators providing virtual XML views. When such a mediator receives an XQuery on the XML view, it decomposes it into queries and requests that are sent to the underlying sources and are compatible with their abilities. The mediator presents a virtual query result to the client. Consequently, client navigations in the virtual query result are reduced into queries and navigations in the sources.

Database query processing has provided a solid basis for query processing in XML mediators. We discuss the special challenges that XML and mediation introduce and overview related work.

Finally, we present a mediator architecture, built around a tuple-oriented XML algebra. This architecture has been used in the Enosys XML mediators. We discuss how optimization, navigation-driven evaluation and capabilities-based rewriting issues can be handled within an algebraic context.