Big Data Analytics 2013

Big Data Analytics 2013

Algorithms for Large-Scale Graph Partitioning

Florian Bourse (Ecole Normale Supérieure, Paris), Marc Lelarge (INRIA- Ecole Normale Supérieure, Paris), and Milan Vojnovic (Microsoft Research)

Many computation tasks are performed on large scale graph data which may amount to as many as billions or even trillions of vertices or edges (e.g. online social network graphs, knowledge graphs, and web graphs). A standard way to scale up such computations is to partition an input graph into clusters of balanced sizes and small cut costs. Graph partitioning is of interest for a wide range of systems including iterative computation platforms, distributed graph databases, and stream processing platforms. Besides its practical importance, graph partitioning is one of the most fundamental and challenging theoretical computer science problems.

In our work we consider novel graph partitioning problems that arise under some computation models of interest -- which specify how the size of a cluster and the cost of a cut are defined. We devote special focus to online assignment algorithms that require a single pass through vertices or edges. Our preliminary performance evaluation results demonstrate significant benefits of using graph partitioning algorithms that are specifically tailored for given computation model.

This is a collaboration project with MSR-INRIA joint research centre.