GPS: A Graph Processing System

GPS: A Graph Processing System


Overview

GPS is an open-source system for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. GPS is similar to Google’s proprietary Pregel system, and Apache Giraph. GPS is a distributed system designed to run on a cluster of machines, such as Amazon's EC2.

In systems such as GPS and Pregel, the input graph (directed, possibly with values on edges) is distributed across machines and vertices send each other messages to perform a computation. Computation is divided into iterations called supersteps. Analogous to the map() and reduce() functions of the MapReduce framework, in each superstep a user-defined function called vertex.compute() is applied to each vertex in parallel. The user expresses the logic of the computation by implementing vertex.compute(). This design is based on Valiant's Bulk Synchronous Parallel model of computation. A detailed description can be found in the original Pregel paper.

There are three main differences between Pregel and GPS:

We have completed an initial version of GPS, which is available to download. We have run GPS on up to 100 Amazon EC2 large instances and on graphs of up to 250 million vertices and 10 billion edges..

Using GPS we have studied the effects on system performance of different ways of partioning the graph, both before the graph computation starts and during the computation. A detailed description of GPS and our work on partitioning can be found in our online technical report.

The GPS project is supported by the National Science Foundation (grant IIS-0904497), KAUST, and an Amazon Web Services Research Grant.


Download GPS Source Code

The GPS source code is available as open-source code under the BSD license. The source code can be found here. We will provide full documentation on how to download, program, and run GPS in early July 2012. If you want to use GPS before then, please email semih @ stanford.edu.


Publications

  • S. Salihoglu and J. Widom. GPS: A Graph Processing System. Technical Report, April 2012.

    People


    Last edited by Semih Salihoglu, August 2011