Optimizing Data Partitioning for Data-Parallel Computing

Qifa Ke, Vijayan Prabhakaran, Yinglian Xie, Yuan Yu, Jingyue Wu, and Junfeng Yang

Abstract

Performance of data-parallel computing (e.g., MapReduce, DryadLINQ)

heavily depends on its data partitions. Solutions implemented by the

current state of the art systems are far from optimal. Techniques

proposed by the database community to find optimal data partitions are

not directly applicable when complex user-defined functions and data

models are involved. We outline our solution, which draws expertise

from various fields such as programming languages and optimization,

and present our preliminary results.

Details

Publication typeInproceedings
Published inHot Topics in Operating Systems (HotOS XIII)
PublisherUSENIX
> Publications > Optimizing Data Partitioning for Data-Parallel Computing