*
Quick Links|Home|Worldwide
Microsoft*
Search for


Dryad

Overview

Dryad is an infrastructure which allows a programmer to use the resources of a computer cluster or a data center for running data-parallel programs. A Dryad programmer can use thousands of machines, each of them with multiple processors or cores, without knowing anything about concurrent programming.

The Structure of Dryad Jobs

dryad job

A Dryad programmer writes several sequential programs and connects them using one-way channels. The computation is structured as a directed graph: programs are graph vertices, while the channels are graph edges. A Dryad job is a graph generator which can synthesize any directed acyclic graph. These graphs can even change during execution, in response to important events in the computation.

Dryad is quite expressive. It completely subsumes other computation frameworks, such as Google's map-reduce, or the relational algebra. Moreover, Dryad handles job creation and management, resource management, job monitoring and visualization, fault tolerance, re-execution, scheduling, and accounting.



The Dryad Software Stack

As a proof of Dryad's versatility, a rich software ecosystem has been built on top Dryad:

  • SSIS on Dryad executes many instances of SQL server, each in a separate Dryad vertex, taking advantage of Dryad's fault tolerance and scheduling. This system is currently deployed in a live production system as part of one of Microsoft's AdCenter log processing pipelines.
  • DryadLINQ generates Dryad computations from the LINQ Language-Integrated Query extensions to C#.
  • The distributed shell is a generalization of the pipe concept from the Unix shell in three ways. If Unix pipes allow the construction of one-dimensional (1-D) process structures, the distributed shell allows the programmer to build 2-D structures in a scripting language. The distributed shell generalizes Unix pipes in three ways:
    1. It allows processes to easily connect multiple file descriptors of each process -- hence the 2-D aspect.
    2. It allows the construction of pipes spanning multiple machines, across a cluster.
    3. It virtualizes the pipelines, allowing the execution of pipelines with many more processes than available machines, by time-multiplexing processors and buffering results.

software layers

Project Members

Publications

Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly
European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007

Video of a presentation on Dryad at the Google Campus, given by Michael Isard, Nov 1, 2007.

Presentation slides from a talk on Dryad at University of California at Santa Cruz, by Michael Isard, February 2008.

Another presentation, given at Microsoft Live Labs by Mihai Budiu, March 2008.

Associated Groups
 

Distributed Systems - Silicon Valley

      Silicon Valley

Web Search and Data Mining - Silicon Valley

      Silicon Valley



©2008 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement