Optimizing Data Partitioning for Data-Parallel Computing

Performance of data-parallel computing (e.g., MapReduce, DryadLINQ)

heavily depends on its data partitions. Solutions implemented by the

current state of the art systems are far from optimal. Techniques

proposed by the database community to find optimal data partitions are

not directly applicable when complex user-defined functions and data

models are involved. We outline our solution, which draws expertise

from various fields such as programming languages and optimization,

and present our preliminary results.

Ke.pdf
PDF file

In  Hot Topics in Operating Systems (HotOS XIII)

Publisher  USENIX

Details

TypeInproceedings
> Publications > Optimizing Data Partitioning for Data-Parallel Computing