Bingsheng He, Mao Yang, Zhenyu Guo, Rishan Chen, Wei Lin, Bing Su, Hongyi Wang, and Lidong Zhou
We introduce the new Wave model for exposing the temporal relationship among the queries in data-intensive distributed computing. The model deﬁnes the notion of query series to capture the recurrent nature of batched computation on periodically updated input streams. This seemingly simple concept captures a signiﬁcant portion of the queries we observed in a production system. The recurring nature of the computation on the same steam opens up surprisingly signiﬁcant opportunities for achieving better performance and higher resource utilization.
All copyrights reserved by USENIX 2009