Mizan: Optimizing Graph Mining in Large Parallel Systems
                        Panos Kalnis, KAUST

Extracting information from graphs, from finding shortest paths to
complex graph mining, is essential for many applications. Due to the
shear size of modern graphs (e.g., social networks), processing must
be done on large parallel computing infrastructures (e.g., the
cloud). Earlier approaches relied on the MapReduce framework, which
proved inadequate for graph algorithms. Recently, the message-passing
model (e.g., Pregel) has emerged. Although the Pregel model has many
advantages, it is agnostic to the graph properties and the
architecture of the underlying computing infrastructure, leading to
suboptimal performance.

In this talk, I will present Mizan, a layer between the users' code
and the computing infrastructure. Mizan considers the structure of the
input graph and the architecture of the infrastructure in order to:
(i) decide whether it is beneficial to generate a near-optimal
partitioning of the graph in a pre-processing step, and (ii) choose
between typical point-to-point message passing and a novel approach
that puts computing nodes in a virtual overlay ring. We deployed Mizan
on a small local Linux cluster, on the cloud (256 virtual machines in
Amazon EC2), and on an IBM BlueGene/P supercomputer (1024 CPUs). Mizan
executes common algorithms on very large graphs up to one order of
magnitude faster than MapReduce-based implementations and up to 4
times faster than implementations relying on Pregel-like hash-based
graph partitioning.