Big Data Analytics 2013

Big Data Analytics 2013

Incremental, Iterative and Interactive Computation in Naiad

Frank McSherry, Derek Murray, Rebecca Isaacs, and Michael Isard (Microsoft Research)

We are developing a system for scalable data analysis designed to support iterative queries over continuously changing inputs at interactive timescales. The system is based on a new computational framework, differential dataflow, that generalizes standard incremental dataflow for far greater reuse of previous results when collections change. Differential dataflow distinguishes between the multiple reasons a collection might change, including both loop feedback and new input data, allowing a system to re-use the most appropriate results from previously performed work when an incremental update arrives.

Our prototype system, Naiad, demonstrates the practical application of differential dataflow. Like many systems, Naiad supports high-level declarative queries, data-parallel execution, and transparent distribution. Unlike other systems, Naiad efficiently executes queries with multiple (possibly nested) loops, while simultaneously responding with low latency to incremental changes to the inputs. We show how differential dataflow enables orders of magnitude speedups for a variety of complex workloads on real streaming data.