|
|
DryadLINQ
The goal of DryadLINQ is to make distributed computing on large
compute cluster simple enough for ordinary programmers. DryadLINQ
combines two important pieces of Microsoft technology: the Dryad
distributed execution engine and the .NET Language Integrated Query
(LINQ).
Dryad provides reliable, distributed computing on thousands of
servers for large-scale data parallel applications. LINQ enables
developers to write and debug their applications in a SQL-like query
language, relying on the entire .NET library and using Visual Studio.
DryadLINQ is a simple, powerful, and elegant programming environment
for writing large-scale data parallel applications running on large PC
clusters.
|
DryadLINQ translates LINQ programs into distributed Dryad computations:
- C# and LINQ data objects become distributed partitioned files.
- LINQ queries become distributed Dryad jobs.
- C# methods become code running on the vertices of a Dryad job.
|
DryadLINQ has the following features:
- Declarative programming: computations are
expressed in a high-level language similar to SQL
- Automatic parallelization: from sequential
declarative code the DryadLINQ compiler generates highly parallel
query plans spanning large computer clusters. For exploiting
multi-core parallelism on each machine DryadLINQ relies on the PLINQ
parallelization framework.
- Integration with Visual Studio: programmers in
DryadLINQ take advantage of the comprehensive VS set of tools:
Intellisense, code refactoring, integrated debugging, build, source
code management.
- Integration with .Net: all .Net libraries,
including Visual Basic, and dynamic languages are available.
- Type safety: distributed computations are
statically type-checked.
- Automatic serialization: data transport
mechanisms automatically handle all .Net object types.
- Job graph optimizations
- static: a rich set of term-rewriting query optimization
rules is applied to the query plan, optimizing locality and
improving performance.
- dynamic: run-time query plan optimizations automatically
adapt the plan taking into account the statistics of the data set
processed.
- Conciseness: the following line of code is a
complete implementation of the Map-Reduce computation
framework in DryadLINQ:
public static IQueryable<TResult>
MapReduce<TSource, TMap, TKey, TResult>(this IQueryable<TSource> source,
Expression<Func<TSource, TMap>> mapper,
Expression<Func<TMap, TKey>> keySelector,
Expression<Func<TKey, IEnumerable<TMap>, TResult>> reducer)
{
return source.Select(mapper).GroupBy(keySelector, reducer);
}
- Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks,
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly.
European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007.
- A
Machine-Learning toolking in DryadLINQ, presentation slides in
PowerPoint.
|