Naiad
Naiad

Naiad is an investigation of data-parallel dataflow computation in the spirit of Dryad and DryadLINQ, but with a focus on incremental computation. Naiad introduces a new computational model, differential dataflow, operating over collections of differences rather than collections of records, and resulting in very efficient implementations of programming patterns that are expensive in existing systems.

Download Naiad
News: Naiad is now available to download. The Naiad tech report is also now available. You can also read more about Naiad on the MSR SVC Big Data blog, and watch an interview about Naiad on Channel 9.

Our goal with Naiad was to address one of the recurring requests for systems like Dryad and DryadLINQ, incremental recomputation, but in so doing found that the necessary mechanisms gave rise to a new computational model, differential dataflow, capable of efficiently processing substantially more complex computations than current systems support, namely incremental and arbitrarily nested iterative dataflow computation.


Consider the problem of computing the connected component structure of a graph. One natural iterative data-parallel approach has each vertex assume a label (initially their own name) and repeatedly share their label with their neighbors, assuming the least label in their neighborhood. Eventually, all vertices in the same component will be labeled with the name of the least vertex in their component. Several data-processing systems make this sort of iterative computation easy to write and efficient to execute.

However, consider what happens if the input changes. Perhaps a single edge is removed, which can result in the separation of two previously connected components. The labels above are invalidated, and it is not easy to determine how to unwind their propagation to return the computation to a state from which new correct labels can be determined. The data-processing systems alluded to above are forced to discard the results of their previous computation and start over from scratch.

Naiad, by comparison, represents a dataset in a compact form indicating where and when records have changed. The specific representation enables efficient combination of incremental and iteration computation, and allows us to update computations like the connected components example above in a fraction of a second. Naiad is currently capable of maintaining the strongly connected component structure (a doubly-nested loop) of a graph defined by a sliding window over edge stream with rates exceeding Twitter's full tweet volume, all with sub-second latency.

 

Downloads
Publications
People
Martin Abadi
Martin Abadi

Paul Barham
Paul Barham

Rebecca Isaacs
Rebecca Isaacs

Michael Isard
Michael Isard

Frank McSherry
Frank McSherry

Derek Murray
Derek Murray