*
Quick Links|Home|Worldwide
Microsoft*
Search for


DryadLINQ

Overview

The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for ordinary programmers. DryadLINQ combines two important pieces of Microsoft technology: the Dryad distributed execution engine and the .NET Language Integrated Query (LINQ).

Dryad provides reliable, distributed computing on thousands of servers for large-scale data parallel applications. LINQ enables developers to write and debug their applications in a SQL-like query language, relying on the entire .NET library and using Visual Studio.

DryadLINQ is a simple, powerful, and elegant programming environment for writing large-scale data parallel applications running on large PC clusters.

system architecture

DryadLINQ translates LINQ programs into distributed Dryad computations:

  • C# and LINQ data objects become distributed partitioned files.
  • LINQ queries become distributed Dryad jobs.
  • C# methods become code running on the vertices of a Dryad job.

DryadLINQ has the following features:

  • Declarative programming: computations are expressed in a high-level language similar to SQL
  • Automatic parallelization: from sequential declarative code the DryadLINQ compiler generates highly parallel query plans spanning large computer clusters. For exploiting multi-core parallelism on each machine DryadLINQ relies on the PLINQ parallelization framework.
  • Integration with Visual Studio: programmers in DryadLINQ take advantage of the comprehensive VS set of tools: Intellisense, code refactoring, integrated debugging, build, source code management.
  • Integration with .Net: all .Net libraries, including Visual Basic, and dynamic languages are available.
  • Type safety: distributed computations are statically type-checked.
  • Automatic serialization: data transport mechanisms automatically handle all .Net object types.
  • Job graph optimizations
    • static: a rich set of term-rewriting query optimization rules is applied to the query plan, optimizing locality and improving performance.
    • dynamic: run-time query plan optimizations automatically adapt the plan taking into account the statistics of the data set processed.
  • Conciseness: the following line of code is a complete implementation of the Map-Reduce computation framework in DryadLINQ:
    public static IQueryable<TResult>
    MapReduce<TSource, TMap, TKey, TResult>(this IQueryable<TSource> source,
                                            Expression<Func<TSource, TMap>> mapper,
                                            Expression<Func<TMap, TKey>> keySelector,
                                            Expression<Func<TKey, IEnumerable<TMap>, TResult>> reducer)
    {
        return source.Select(mapper).GroupBy(keySelector, reducer);
    }
    
Project Members

Publications

References

Associated Groups
 

Distributed Systems - Silicon Valley

      Silicon Valley

Web Search and Data Mining - Silicon Valley

      Silicon Valley



©2008 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement