|
|
Dryad
Dryad is an infrastructure which allows a programmer to use the
resources of a computer cluster or a data center for running
data-parallel programs. A Dryad programmer can use thousands of
machines, each of them with multiple processors or cores, without
knowing anything about concurrent programming.
|
|
A Dryad programmer writes several sequential programs and connects
them using one-way channels. The computation is structured as a
directed graph: programs are graph vertices, while the
channels are graph edges. A Dryad job is a graph
generator which can synthesize any directed acyclic graph. These
graphs can even change during execution, in response to important
events in the computation.
Dryad is quite expressive. It completely subsumes other
computation frameworks, such as Google's map-reduce, or the relational
algebra. Moreover, Dryad handles job creation and management,
resource management, job monitoring and visualization, fault
tolerance, re-execution, scheduling, and accounting.
|
As a proof of Dryad's versatility, a rich software ecosystem has
been built on top Dryad:
|
- SSIS
on Dryad executes many instances of SQL server, each in a
separate Dryad vertex, taking advantage of Dryad's fault tolerance and
scheduling. This system is currently deployed in a live production
system as part of one of Microsoft's
AdCenter log processing pipelines.
- DryadLINQ
generates Dryad computations from the LINQ
Language-Integrated Query extensions to C#.
- The distributed shell is a generalization of the pipe concept from
the Unix shell in three ways. If Unix pipes allow the construction of
one-dimensional (1-D) process structures, the distributed shell allows
the programmer to build 2-D structures in a scripting language. The
distributed shell generalizes Unix pipes in three ways:
- It allows processes to easily connect multiple file descriptors of
each process -- hence the 2-D aspect.
- It allows the construction of pipes spanning multiple machines,
across a cluster.
- It virtualizes the pipelines, allowing the execution of
pipelines with many more processes than available machines, by
time-multiplexing processors and buffering results.
|
|
Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly
European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007
Video of a
presentation on Dryad at the Google Campus, given by Michael Isard,
Nov 1, 2007.
Presentation
slides from a talk on Dryad at University of California at Santa
Cruz, by Michael Isard, February 2008.
Another presentation, given at
Microsoft Live Labs by Mihai Budiu, March 2008.
|