Project Summary

The goal of the project is to build a new infrastructure, called DataMotion, for managing and analyzing large volumes of dynamic and diverse data. Today, most information systems---even those handling multiple, distributed, heterogeneous data sources---are based on stored and relatively static data sets. However, much of the most critical information today is highly dynamic, coming instead in the form of multiple, rapid, time-varying data streams. For example, streams may contain information on banking transactions, telephone calls, observed symptoms in emergency rooms, sensors in scientific experiments, logins at computer servers, or changes to Web pages across the world.
The volume is so high that it is difficult to store all the information in conventional databases. And even if it is possible to store a particular stream in a database, it is then difficult to perform the competing analyses required by diverse and geographically distributed users on this centralized database. A much better approach is to route the stream to the interested users, while in the process filtering according to user's interests, combining the stream with other relevant streams, and performing real-time analysis of the data whenever possible. DataMotion enables such distributed, real-time processing.

Our project consists of 4 inter-related research thrusts, each addressing an important class of problems: (a) How to perform traditional database and data mining operations on streams; (b) How to generate streams, and how to present streams to users; (c) How to route streams to distributed users with differing information needs; and (d) How to ensure the security and privacy of streams. In addition, we will implement an experimental testbed, using 4 applications that drive our development: (a) business streams, (b) detection of epidemics, (c) scientific data streams, and (d) monitoring Web changes.




Publications are listed in the progress reports below. 


Sites relevant to the project include: Infolab home page.


Progress Report, 2004

Progress Report, 2005

Progress Report, 2006

Progress Report, 2007

Progress Report, 2008

Progress Report, 2009

Final Report, 2010

Last modified: August 2010