Nondeterminism in MapReduce Considered Harmful? An Empirical Study on Non-commutative Aggregators in MapReduce Programs

Tian Xiao, Jiaxing Zhang, Hucheng Zhou, Zhenyu Guo, Sean McDirmid, Wei Lin, Wenguang Chen, and Lidong Zhou

Abstract

The simplicity of MapReduce introduces unique subtleties that cause hard-to-detect bugs; in particular, the unfixed order of reduce function input is a source of nondeterminism that is harmful if the reduce function is not commutative and sensitive to input order. Our extensive study of production MapReduce programs reveals interesting findings on commutativity, nondeterminism, and correctness. Although non-commutative reduce functions lead to five bugs in our sample of well-tested production programs, we surprisingly have found that many non-commutative reduce functions are mostly harmless due to, for example, implicit data properties. These findings are instrumental in advancing our understanding of MapReduce program correctness.

Details

Publication typeInproceedings
Publisher(ICSE SEIP) Software Engineering in Practice
> Publications > Nondeterminism in MapReduce Considered Harmful? An Empirical Study on Non-commutative Aggregators in MapReduce Programs