Re-optimizing Data Parallel Computing
MSR, Bing, UCBerkeley
This ships in Bing's Cosmos clusters since December 2011.
How would execution plans for jobs in big data clusters change if given
additional information about properties of the user code, data and how the code
and data interact? Can we extract such properties at scale?
PACMan: Coordinated Memory Caching for Parallel Jobs
Ganesh Anantharanayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker, Ion Stoica
How to build an input cache spanning a cluster of machines to speed-up parallel executions? Hint: Not LRU.