There is an arms race to perform increasingly sophisticated data
analysis on ever more varied types of data (text, audio, video,
OCR, sensor data, etc.). Current data processing systems typically
assume that the data have rigid, precise semantics, which these new
data sources do not possess. On the other hand, many of the
state-of-the-art approaches to both cope with variations in the
structure of data and to deeply anlayze data are statistical. The Hazy
project is exploring integrating statistical processing techniques
with data processing systems with the goal of making such systems
easier to build, to deploy, and to maintain.
To demonstrate our ideas, we are building several applications, including systems to read large amounts of text and answer sophisticated questions (see WiscI and GeoDeepDive) and building general primitives for data analytics that are now incorporated in products from Oracle and Pivotal. Additionally, some of our ideas have helped to find Neutrinos with IceCube (see IceCube).
DeepDive, a general-purpose statistical inference system, has been released. Check it out! One of the most popular uses of the software is in machine reading (or knowledge-base construction), but we are starting to branch out.
Hazy is generously supported by the Air Force Research Laboratory (AFRL) under prime contract No. FA8750-09-C-0181, No. FA8750-13-2-0039, and FA9550-13-1-0138, the National Science Foundation CAREER Award under No. IIS-1054009 and EAGER Award under No. EAR-1242902, the Office of Naval Research under awards No. N000141210041 and No. N000141310129, the Sloan Research Fellowship, the University of Wisconsin-Madison, and gifts, research awards or contracts from American Family Insurance, Google, Greenplum, Johnson Controls, LogicBlox, Microsoft, Oracle, Raytheon and the CHTC. Any opinions, findings, and conclusions or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of any of the above sponsors including DARPA, AFRL, ONR or the US government.
Visit our YouTube channel, or check out some project overviews right here: