Programming models such as HIVE and DryadLINQ provide programmers with simple declarative abstractions for writing data intensive computations that can run on a large cluster of machines. However, this level of abstraction comes at a cost – the inability to understand, predict and debug performance. This project aims at building performance models for predicting the performance of the query while identifying bottleneck resources and computations.