Big Data Analytics 2013

Big Data Analytics 2013

Active Data Management for Distributed Graph Processing

Karthik Nilakant (University of Cambridge) and Eiko Yoneki (University of Cambridge)

Graph processing systems are subject to two conflicting concerns: firstly, a large number of algorithms for processing graphs exist, most of which exhibit poor locality of accesses. Secondly, the increasing scale of data has resulted in a need to distribute the data across multiple machines, further affecting locality. Most existing systems tend to consider these factors separately. We propose schemes for pro-active or reactive active graph management. Pro-actively, one could instrument algorithmic code fragments to allow the processing engine to predict future behaviour of the graph algorithm. Producing “balanced cuts” on a graph is computationally expensive, and infeasible for large datasets. Instead, lightweight graph analytics could be deployed to re-arrange the graph into less tightly coupled clusters. We propose a framework for considering these factors in unison. Ultimately, the choice of method to use in a given scenario will depend on the structure of the dataset and the processing algorithm. Our focus will be on identifying the minimum characteristics that must be gleaned from the code or data to be processed (or alternatively provided by the user), in order to boost the performance of such a system.