based on an Extended Abstract for the IEEE Data Engineering Conference Keynote, 17Feb1994. Historical remarks have been removed.
Databases are ubiquitous, but are also seen as contributing to
'information overload'. We must reconsider why people need
databases, or, more in particular, data to be retrieved from the
databases.
We need data to make decisions.
We can exploit the definition implied by Shannon in his
Information Theory in 1949, that information is novel, i.e.,
previously unknown to its receiver and hence can lead to action.
The making of decisions means making a choice among alternatives,
and having more information helps in assessing the cost and
benefits of the alternatives. Information hence also reduces
risk, because it becomes less likely that alternatives with poor
benefit/cost ratios will be taken.
Today's databases do not support the decision making processes directly. Decision making is typically preceded by a planning activity, in which alternatives are developed, assessed, and pruned. Databases of various flavors can contribute, but human and artificial intelligence are needed to help in selecting the data, summarizing them in the forms needed for the decision to be made, merging the results with other sources, and attaching the resulting information to a branch of the tree of alternatives. The decision tree is rarely fully under our control; for every action of ours there are likely to be reactions by the other parties, and we need intelligence to enumerate those branches and assign probabilities and costs to them.
There is already one crucial difference: whereas past data should report a consistent history, there are many possible futures. An information system must be able to deal with forward projections: costs and benefits of todays and tomorrows actions into the futures. That means that information systems must deal with multiple future worlds. Using relational terms, every tuple must be stamped with a projected time and a label identifying the alternate world. At every decision point at least one more alternate world is created. If one decision applies to multiple alternatives, then that many future worlds will be created. The possible reactions that we enumerate create yet more.
It is obvious that dealing with planning information requires new engineering concepts. Data for these multiple worlds is highly redundant, and must be represented effectively. Each world must be identified with a sequence of actions and reactions, and, when one of them changes, must be rapidly recomputed. To assess alternative actions it must be possible to summarize, across one point in future time, the benefits and costs incurred, and risks remaining in the future beyond. The summarization must also identify which world is best, so that the best actions can be determined. Note that a MAX-function by itself is inasdequate. Some spreadsheets today permit recording of several alternatives, but their matrix representation limits the complexity that can be represented and hence assessed.
If databases are to move closer to applications in planning and decision making they must be engineered to serve information needs. In addition to dealing with the demands to manage more and more varied data, as discussed above, we will list here some technological hurdles that information systems must address in the next decade.