Min-Max Hypergraph Partitioning

  • Dan Alistarh ,
  • Jennifer Iglesias ,
  • Milan Vojnovic

MSR-TR-2015-15 |

In many applications, the structure of data can be represented by a hyper-graph, where the data items are vertices, and the associations among items are represented by hyper-edges. Equivalently, we are given as input a bipartite graph with two kinds of vertices: items, and associations (which we refer to as topics). We consider the problem of partitioning the set of items into a given number of partitions, such that the maximum number of topics covered by a partition is minimized.

This is a natural clustering problem, with various applications such as partitioning of information objects like documents, images, and videos, or allocation of system resources in data centers in the context of distributed streaming processing platforms, or graph and machine learning computation platforms.

In this paper, we introduce an approximation algorithm for the offline version of the problem, which is guaranteed to yield a good approximation for every input instance where a solution of small cost exists. Further, we consider the online version of the problem, in which items arrive online and each item must be assigned irrevocably to one of the partitions at the time of its arrival.

We show that a simple greedy online assignment of items is able to recover a hidden co-clustering of vertices under a natural set of recovery conditions. We also report on extensive experimental results, covering both synthetically generated and real-world input bipartite graphs, which demonstrate that the greedy online assignment of items consistently yields superior performance when compared with alternative approaches.