Enabling Declarative Graph Analytics over Large, Noisy Information Networks
                  Amol Deshpande, University of Maryland

Over the last decade, information networks have become ubiquitous and
widespread. These include social networks, communication networks,
financial transaction networks, citation networks, disease transmission
networks, and many more. Social contact graphs are expected to be
available for analysis in near future, and can potentially be used to
gain insights into various social phenomena as well as in disease
outbreak and prevention. There is thus a growing need for data
management systems that can support both real-time ingest, storage,
and querying over information networks, and complex analysis over
them. However, there is a lack of established data management systems
and tools that can manage such graph-structured data.

In this talk, I will discuss some of our early work on building a
graph database system to support declarative analytics, and I will
specifically focus on our work addressing two challenges. First, the
raw observational data describing information networks is typically
noisy and incomplete, and often at the wrong level of fidelity and
abstraction for meaningful data analysis. This has resulted in a
growing body of somewhat ad hoc and domain-specific work on extracting,
cleaning, and annotating network data. I will present the architecture
of a data management system that we are building that supports a
declarative Datalog-based language for specifying analysis tasks over
network data. Second, the increasing availability of the digital trace
of information networks over time has opened up opportunities both in
temporal evolutionary analysis as well as in data mining and comparative
analytics over historical information. I will discuss our ongoing work
on managing such historical network data, and on supporting efficient
retrieval of multiple graphs from arbitrary time points in the past.