Critical Path based Performance Models for Distributed Queries

MSR-TR-2012-121 |

Programming models such as MapReduce and DryadLINQ provide programmers with declarative abstractions (such as SQL like query languages) for writing data intensive computations. The models also provide runtime systems that can execute these queries on a large cluster of machines, while dealing with the vagaries of distribution such as messaging, failures and synchronization. However, this level of abstraction comes at a cost – the inability to understand, predict and debug performance. In this paper, we propose a performance modelling approach for predicting the execution time of distributed queries. Our modeling approach is based on a combination of the critical path method, empirically generated black box models and cardinality estimation techniques from databases. We evaluate the models using several real world applications and find that models can accurately predict execution time to within 10% of actual execution time. We demonstrate the usefulness of the model in identifying performance bottlenecks, both during design and while debugging performance problems.