Differential dataflow

  • Frank McSherry ,
  • Derek Murray ,
  • Rebecca Isaacs ,
  • Michael Isard

Proceedings of CIDR 2013 |

Existing computational models for processing continuously changing input data are unable to efficiently support iterative queries except in limited special cases. This makes it difficult to perform complex tasks, such as social-graph analysis on changing data at interactive timescales, which would greatly benefit those analyzing the behavior of services like Twitter. In this paper we introduce a new model called differential computation, which extends traditional incremental computation to allow arbitrarily nested iteration, and explain—with reference to a publicly available prototype system called Naiad—how differential computation can be efficiently implemented in the context of a declarative data-parallel dataflow language. The resulting system makes it easy to program previously intractable algorithms such as incrementally updated strongly connected components, and integrate them with data transformation operations to obtain practically relevant insights from real data streams.