Hazy: A Relational Operator-based Approach to Statistical Data Analysis Christopher Re, University of Wisconsin There is an arms race to perform increasingly sophisticated data analysis on increasingly varied types of data (text, audio, video, OCR, sensor data, etc.). To cost effectively process these large volumes of data, we need automatic techniques. But current systems that process large volumes of data typically assume that the data have a rigid, precise semantics, which these new data sources do not possess. To understand variations in data's structure and meaning, many of the state of the art approaches are statistical. Building an ad hoc system for each new task seems to be wrongheaded as the lack of code sharing makes each task expensive to complete. Additionally, such an approach requires that we relearn the hard fought lessons of the last 30 years of data management (e.g., handling concurrency or recovery). In this talk, I describe our system, Hazy, that augments a traditional RDBMS (Postgres) to allow developers to specify statistical data analysis applications. The hypothesis behind Hazy is that a large fraction of a diverse set of statistical applications can be captured using a small handful of primitives. To understand this hypothesis, my group is building several applications to exercise Hazy's primitives. I describe two of these applications: First, I describe enhancing a text processing application with a popular statistical technique (and one of Hazy's operators), classification. A key technical challenge is to incrementally maintain the output of the classification task (similar to maintaining a materialized view). I describe Hazy's algorithm that gives an order of magnitude performance improvement over naive approaches on several real-world data sets. Second, I describe a new collaboration that applies Hazy's ideas to the problem of detecting neutrinos from the Big Bang. This work is in collaboration with the IceCube team. IceCube operates a 1-cubic-km block of ice located at the South Pole that functions as a neutrino telescope. In addition to describing our technical progress on these applications, I describe how these different applications share a common technical underpinning which supports Hazy's central hypothesis.