Nondeterminism in MapReduce Considered Harmful? An Empirical Study on Non-commutative Aggregators in MapReduce Programs

  • Tian Xiao ,
  • Jiaxing Zhang ,
  • Hucheng Zhou ,
  • Zhenyu Guo ,
  • Sean McDirmid ,
  • Wei Lin ,
  • Wenguang Chen ,

Published by (ICSE SEIP) Software Engineering in Practice

The simplicity of MapReduce introduces unique subtleties that cause hard-to-detect bugs; in particular, the unfixed order of reduce function input is a source of nondeterminism that is harmful if the reduce function is not commutative and sensitive to input order. Our extensive study of production MapReduce programs reveals interesting findings on commutativity, nondeterminism, and correctness. Although non-commutative reduce functions lead to five bugs in our sample of well-tested production programs, we surprisingly have found that many non-commutative reduce functions are mostly harmless due to, for example, implicit data properties. These findings are instrumental in advancing our understanding of MapReduce program correctness.