The goal of DryadLINQ is to make large-scale, distributed cluster computing simple, simple enough for ordinary programmers.
Overview
The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for ordinary programmers. DryadLINQ combines two important pieces of Microsoft technology: the Dryad distributed execution engine and the .NET Language Integrated Query (LINQ).
Dryad provides reliable, distributed computing on thousands of servers for large-scale data parallel applications. LINQ enables developers to write and debug their applications in a SQL-like query language, relying on the entire .NET library and using Visual Studio.
DryadLINQ is a simple, powerful, and elegant programming environment for writing large-scale data parallel applications running on large PC clusters.
DryadLINQ translates LINQ programs into distributed Dryad computations:
- C# and LINQ data objects become distributed partitioned files.
- LINQ queries become distributed Dryad jobs.
- C# methods become code running on the vertices of a Dryad job.
DryadLINQ has the following features:
- Declarative programming: computations are expressed in a high-level language similar to SQL
- Automatic parallelization: from sequential declarative code the DryadLINQ compiler generates highly parallel query plans spanning large computer clusters. For exploiting multi-core parallelism on each machine DryadLINQ relies on the PLINQ parallelization framework.
- Integration with Visual Studio: programmers in DryadLINQ take advantage of the comprehensive VS set of tools: Intellisense, code refactoring, integrated debugging, build, source code management.
- Integration with .Net: all .Net libraries, including Visual Basic, and dynamic languages are available.
- Type safety: distributed computations are statically type-checked.
- Automatic serialization: data transport mechanisms automatically handle all .Net object types.
- Job graph optimizations
- static: a rich set of term-rewriting query optimization rules is applied to the query plan, optimizing locality and improving performance.
- dynamic: run-time query plan optimizations automatically adapt the plan taking into account the statistics of the data set processed.
- Conciseness: the following line of code is a complete implementation of the Map-Reduce computation framework in DryadLINQ: public static IQueryable
Project Members
- Yuan Yu
- Michael Isard
- Dennis Fetterly
- Mihai Budiu
- Ulfar Erlingsson
- Pradeep Kumar Gunda
- Jon Currey
- Kannan Achan (alumnus)
Publications
- DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey
Symposium on Operating System Design and Implementation (OSDI), San Diego, CA, December 8-10, 2008. - Some sample programs written in DryadLINQ
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, Jon Currey, Frank McSherry, and Kannan Achan
Microsoft Research Technical Report, MSR-TR-2008-74, May 2008, 37 pages - Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly
European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007. - Cluster Computing with DryadLINQ
Presentation by Mihai Budiu at Palo Alto Research Center CSL Colloquium, Palo Alto, CA May 8, 2008 - A Machine-Learning toolking in DryadLINQ
Presentation slides in PowerPoint by Mihai Budiu and Kannan Achan.
References
- Internal Microsoft Tutorial for getting started with DryadLINQ. May 2008.
- C# Version 3.0 Specification, Microsoft. May 2006.
- The .NET Standard Query Operators, Microsoft. May 2006.
- DLinq: .NET Language Integrated Query for Relational Data, Microsoft. September 2005.
- XLinq: .NET Language Integrated Query for XML Data, Microsoft. May 2006.



