DryadLINQ

Established: January 25, 2010

DryadLINQ is a simple, powerful, and elegant programming environment for writing large-scale data parallel applications running on large PC clusters.

Overview

The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for every programmer. DryadLINQ combines two important pieces of Microsoft technology: the Dryad (opens in new tab)distributed execution engine and the .NET Language Integrated Query (LINQ (opens in new tab)).

Dryad provides reliable, distributed computing on thousands of servers for large-scale data parallel applications. LINQ enables developers to write and debug their applications in a SQL-like query language, relying on the entire .NET library and using Visual Studio.

DryadLINQ translates LINQ programs into distributed Dryad computations:

  • C# and LINQ data objects become distributed partitioned files.
  • LINQ queries become distributed Dryad jobs.
  • C# methods become code running on the vertices of a Dryad job.

DryadLINQ has the following features:

  • Declarative programming: computations are expressed in a high-level language containing a superset of the best features of SQL, functional programming and .Net.
  • Automatic parallelization: from sequential declarative code the DryadLINQ compiler generates highly parallel query plans spanning large computer clusters. DryadLINQ also exploits multi-core parallelism on each machine.
  • Integration with Visual Studio: programmers in DryadLINQ take advantage of the comprehensive VS set of tools: Intellisense, code refactoring, integrated debugging, build, source code management.
  • Integration with .Net: all .Net libraries, including Visual Basic, and dynamic languages are available.
  • Type safety: distributed computations are statically type-checked.
  • Automatic serialization: data transport mechanisms automatically handle all .Net object types.
  • Job graph optimizations:
    • static: a rich set of term-rewriting query optimization rules is applied to the query plan, optimizing locality and improving performance.
    • dynamic: run-time query plan optimizations automatically adapt the plan taking into account the statistics of the data set processed.
  • Conciseness: the following line of code is a complete implementation of the Map-Reduce computation framework in DryadLINQ:

DryadLINQ_code

A commercial implementation of Dryad and DryadLINQ was released in 2011 in beta form under the name Linq to HPC: http://msdn.microsoft.com/en-us/library/hh378101.aspx (opens in new tab).