DryadLINQ is a simple, powerful, and elegant programming environment for writing large-scale data parallel applications running on large PC clusters.
Overview
New! If you are interested in acquiring the DryadLINQ source for research purposes, please contact one of the project members below!
The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for every programmer. DryadLINQ combines two important pieces of Microsoft technology: the Dryad distributed execution engine and the .NET Language Integrated Query (LINQ).
Dryad provides reliable, distributed computing on thousands of servers for large-scale data parallel applications. LINQ enables developers to write and debug their applications in a SQL-like query language, relying on the entire .NET library and using Visual Studio.
DryadLINQ translates LINQ programs into distributed Dryad computations:
- C# and LINQ data objects become distributed partitioned files.
- LINQ queries become distributed Dryad jobs.
- C# methods become code running on the vertices of a Dryad job.
DryadLINQ has the following features:
- Declarative programming: computations are expressed in a high-level language similar to SQL
- Automatic parallelization: from sequential declarative code the DryadLINQ compiler generates highly parallel query plans spanning large computer clusters. For exploiting multi-core parallelism on each machine DryadLINQ relies on the PLINQ parallelization framework.
- Integration with Visual Studio: programmers in DryadLINQ take advantage of the comprehensive VS set of tools: Intellisense, code refactoring, integrated debugging, build, source code management.
- Integration with .Net: all .Net libraries, including Visual Basic, and dynamic languages are available.
- Type safety: distributed computations are statically type-checked.
- Automatic serialization: data transport mechanisms automatically handle all .Net object types.
- Job graph optimizations
- static: a rich set of term-rewriting query optimization rules is applied to the query plan, optimizing locality and improving performance.
- dynamic: run-time query plan optimizations automatically adapt the plan taking into account the statistics of the data set processed.
- Conciseness: the following line of code is a complete implementation of the Map-Reduce computation framework in DryadLINQ:
- public static IQueryable<R>
MapReduce<S,M,K,R>(this IQueryable<S> source,
Expression<Func<S,IEnumerable<M>>> mapper,
Expression<Func<M,K>> keySelector,
Expression<Func<K,IEnumerable<M>,R>> reducer)
{
return source.SelectMany(mapper).GroupBy(keySelector, reducer);
}
Project Members
- Yuan Yu
- Michael Isard
- Dennis Fetterly
- Mihai Budiu
- Ulfar Erlingsson
- Pradeep Kumar Gunda
- Jon Currey
- Kannan Achan (alumnus)
Publications
-
Distributed Aggregation for Data-Parallel Computing: Interfaces and Implementations
Yuan Yu, Pradeep Kumar Gunda, Michael Isard
ACM Symposium on Operating Systems Principles (SOSP), October 2009
-
Distributed Data-Parallel Computing Using a High-Level Programming Language
Michael Isard, Yuan Yu
International Conference on Management of Data (SIGMOD), July 2009
- DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey
Symposium on Operating System Design and Implementation (OSDI), San Diego, CA, December 8-10, 2008. - Some sample programs written in DryadLINQ
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, Jon Currey, Frank McSherry, and Kannan Achan
Microsoft Research Technical Report, MSR-TR-2008-74, May 2008, 37 pages - Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly
European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007.
Presentations
- Distributed Data-Parallel Computing Using a High-Level Programming Language
Presentation by Yuan Yu at SIGMOD, July, 2009 - DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing
Presentation by Yuan Yu at OSDI, December, 2008 - Cluster Computing with DryadLINQ
Presentation by Mihai Budiu at Palo Alto Research Center CSL Colloquium, Palo Alto, CA May 8, 2008 - A Machine-Learning toolking in DryadLINQ
Presentation slides in PowerPoint by Mihai Budiu and Kannan Achan.
References
- C# Version 3.0 Specification, Microsoft. May 2006.
- The .NET Standard Query Operators, Microsoft. May 2006.



