Supporting Iteration in a Heterogeneous Dataflow Engine

Jon Currey; Simon Baker; Chris Rossbach

Supporting Iteration in a Heterogeneous Dataflow Engine

Jon Currey ,
Simon Baker ,
Chris Rossbach

SFMA 2013 | April 2013

Published by The 3rd Workshop on Systems for Future Multicore Architectures

Download BibTex

Dataflow execution engines such as MapReduce, DryadLINQ, and PTask have enjoyed success because they simplify development for a class of important parallel applications. These systems sacrifice generality for simplicity: while many workloads are easily expressed, important idioms like iteration and recursion are difficult to express and support efficiently.

We consider the problem of extending a dataflow engine to support data-dependent iteration in a heterogeneous environment, where architectural diversity introduces data migration and scheduling challenges that complicate the problem. We propose constructs that enable a dataflow engine to efficiently support data-dependent control flow in a heterogeneous environment, implement them in a prototype system called IDEA, and use them to implement a variant of optical flow, a well-studied computer vision algorithm. Optical flow relies heavily on nested loops, making it difficult to express without explicit support for iteration. We demonstrate that IDEA enables up to 18× speedup over sequential and 32% speedup over a GPU implementation using synchronous host-based control.