GPS: A Graph Processing System

GPS: A Graph Processing System


Overview

GPS is an open-source system for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. GPS is similar to Google’s proprietary Pregel system, and Apache Giraph. GPS is a distributed system designed to run on a cluster of machines, such as Amazon's EC2.

In systems such as GPS and Pregel, the input graph (directed, possibly with values on edges) is distributed across machines and vertices send each other messages to perform a computation. Computation is divided into iterations called supersteps. Analogous to the map() and reduce() functions of the MapReduce framework, in each superstep a user-defined function called vertex.compute() is applied to each vertex in parallel. The user expresses the logic of the computation by implementing vertex.compute(). This design is based on Valiant's Bulk Synchronous Parallel model of computation. A detailed description can be found in the original Pregel paper.

There are five main differences between Pregel and GPS:

We have completed an initial version of GPS, which is available to download. We have run GPS on up to 100 Amazon EC2 large instances and on graphs of up to 250 million vertices and 10 billion edges.

Using GPS we have studied the effects on system performance of different ways of partioning the graph, both before the graph computation starts and during the computation. A detailed description of GPS and our work on partitioning can be found in our online technical report.

Using our Green-Marl compiler, we were able to implement several algorithms such as Approximate Betweenness Centrality and Conductance, whose native GPS implmentations are very challenging. A detailed description of our work on the Green-Marl compiler can be found in our Green-Marl-to-GPS tech report. Details about Green-Marl can be found in the original Green-Marl paper.

We have been exploring how to solve some of the fundamental graph problems on Pregel-like systems, such as computing strongly connected components (SCC), or minimum spanning trees in large-scale graphs. A detailed description of our work on computing SCCs can be found here.

The GPS project is supported by the National Science Foundation (grant IIS-0904497), KAUST, and an Amazon Web Services Research Grant.


GPS Source Code and Documentation

The GPS source code is available as open-source code under the BSD license. The source code can be found here. Here is the GPS documentation site containing information about how to download, set-up and run GPS as well as the GPS extensions to Pregel and the GPS API. Please join the user group stanfordgpsusers @ googlegroups.com for any questions/problems you might have in setting GPS up (the email group is public to everyone, you can join the group from Google Groups website).


Publications

  • S. Salihoglu and J. Widom. GPS: A Graph Processing System. Technical Report, April 2012.

  • S. Hong, S. Salihoglu, J. Widom and K. Olukotun. Compiling GreenMarl into GPS. Technical Report, November 2012.

  • S. Salihoglu and J. Widom. Computing Strongly Connected Components in Pregel-like Systems. Technical Report, March 2013.

    People


    Last edited by Semih Salihoglu, August 2011