DryadLINQ is a simple, powerful, and elegant programming environment for writing large-scale data parallel applications running on large PC clusters.
The goal of DryadLINQ is to make distributed computing on large compute cluster simple enough for every programmer. DryadLINQ combines two important pieces of Microsoft technology: the Dryad distributed execution engine and the .NET Language Integrated Query (LINQ).
Dryad provides reliable, distributed computing on thousands of servers for large-scale data parallel applications. LINQ enables developers to write and debug their applications in a SQL-like query language, relying on the entire .NET library and using Visual Studio.
DryadLINQ translates LINQ programs into distributed Dryad computations:
- C# and LINQ data objects become distributed partitioned files.
- LINQ queries become distributed Dryad jobs.
- C# methods become code running on the vertices of a Dryad job.
DryadLINQ has the following features:
- Declarative programming: computations are expressed in a high-level language containing a superset of the best features of SQL, functional programming and .Net.
- Automatic parallelization: from sequential declarative code the DryadLINQ compiler generates highly parallel query plans spanning large computer clusters. DryadLINQ also exploits multi-core parallelism on each machine.
- Integration with Visual Studio: programmers in DryadLINQ take advantage of the comprehensive VS set of tools: Intellisense, code refactoring, integrated debugging, build, source code management.
- Integration with .Net: all .Net libraries, including Visual Basic, and dynamic languages are available.
- Type safety: distributed computations are statically type-checked.
- Automatic serialization: data transport mechanisms automatically handle all .Net object types.
- Job graph optimizations
- static: a rich set of term-rewriting query optimization rules is applied to the query plan, optimizing locality and improving performance.
- dynamic: run-time query plan optimizations automatically adapt the plan taking into account the statistics of the data set processed.
- Conciseness: the following line of code is a complete implementation of the Map-Reduce computation framework in DryadLINQ:
- public static IQueryable<R>
MapReduce<S,M,K,R>(this IQueryable<S> source,
return source.SelectMany(mapper).GroupBy(keySelector, reducer);
A commercial implementation of Dryad and DryadLINQ was released in 2011 in beta form under the name Linq to HPC: http://msdn.microsoft.com/en-us/library/hh378101.aspx.
- Yuan Yu
- Michael Isard
- Dennis Fetterly
- Mihai Budiu
- Frank McSherry
- Jon Currey
- Qifa Ke
- Ulfar Erlingsson (alumnus)
- Pradeep Kumar Gunda (alumnus)
- Kannan Achan (alumnus)
- Optimus: A Dynamic Rewriting Framework for Data-Parallel Execution Plans,
Qifa Ke, Michael Isard, and Yuan Yu
Eurosys 2013, ACM, April 2013
- Fay: Extensible Distributed Tracing from Kernels to Clusters
Úlfar Erlingsson, Marcus Peinado, Simon Peter, and Mihai Budiu
ACM Symposium on Operating Systems Principles (SOSP), Cascais, Portugal, October 23-26, 2011
- Parallelizing the Training of the Kinect Body Parts Labeling Algorithm
Mihai Budiu, Jamie Shotton, Derek G. Murray, and Mark Finocchio
Big Learning: Algorithms, Systems and Tools for Learning at Scale, Sierra Nevada, Spain, December 16-17, 2011
- Large-Scale Machine Learning using DryadLINQ, chapter in Scaling Up Machine Learning, Frank McSherry, Yuan Yu, Mihai Budiu, Michael Isard, and Dennis Fetterly, Cambridge University Press, December 2011
- TidyFS: A Simple and Small Distributed File System, Dennis Fetterly, Maya Haridasan, Michael Isard, and Swaminathan Sundararaman, in Proceedings of the USENIX Annual Technical Conference (USENIX'11), USENIX, 15 June 2011
- Monitoring and Debugging DryadLINQ Applications with Daphne
Vilas Jagannath, Zuoning Yin, and Mihai Budiu
International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS), Anchorage, AK, May 20, 2011
- DryadOpt: Branch-and-Bound on Distributed Data-Parallel Execution Engines
Mihai Budiu, Daniel Delling, and Renato Werneck
IEEE International Parallel and Distributed Processing Symposium (IPDPS), Anchorage, AK, May 16-20, 2011
- Steno: automatic optimization of declarative queries, Derek G. Murray, Michael Isard and Yuan Yu, in Proceedings of PLDI 2011, San Jose, CA, June 2011
- Nectar: Automatic Management of Data and Computation in Datacenters, Pradeep Kumar Gunda, Lenin Ravindranath, Chandramohan A. Thekkath, Yuan Yu, and Li Zhuang, in Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI), October 2010
- A Data-Parallel Toolkit for Information Retrieval, Dennis Fetterly and Frank McSherry, in Proceedings of SIGIR, Association for Computing Machinery, Inc., 19 July 2010
- Quincy: Fair Scheduling for Distributed Computing Clusters, Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg, in Proceedings of 22nd ACM Symposium on Operating Systems Principles, Association for Computing Machinery, Inc., 11 October 2009
- Privacy Integrated Queries, Frank McSherry, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD), Association for Computing Machinery, Inc., June 2009
Distributed Aggregation for Data-Parallel Computing: Interfaces and Implementations, Yuan Yu, Pradeep Kumar Gunda, Michael Isard, ACM Symposium on Operating Systems Principles (SOSP), October 2009
Distributed Data-Parallel Computing Using a High-Level Programming Language, Michael Isard, Yuan Yu, International Conference on Management of Data (SIGMOD), July 2009
DryadInc: Reusing work in large-scale computations
Lucian Popa, Mihai Budiu, Yuan Yu, and Michael Isard
Workshop on Hot Topics in Cloud Computing (HotCloud), San Diego, CA, June 15, 2009
- Hunting for problems with Artemis
Gabriela F. Creţu-Ciocârlie, Mihai Budiu, and Moises Goldszmidt
USENIX Workshop on the Analysis of System Logs (WASL), San Diego, CA, December 7, 2008
- DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey
Symposium on Operating System Design and Implementation (OSDI), San Diego, CA, December 8-10, 2008.
- Some sample programs written in DryadLINQ
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, Jon Currey, Frank McSherry, and Kannan Achan
Microsoft Research Technical Report, MSR-TR-2008-74, May 2008, 37 pages
- Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly
European Conference on Computer Systems (EuroSys), Lisbon, Portugal, March 21-23, 2007.
- Distributed Data-Parallel Computing Using a High-Level Programming Language
Presentation by Yuan Yu at SIGMOD, July, 2009
- DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing
Presentation by Yuan Yu at OSDI, December, 2008
- Cluster Computing with DryadLINQ
Presentation by Mihai Budiu at Palo Alto Research Center CSL Colloquium, Palo Alto, CA May 8, 2008
- A Machine-Learning toolking in DryadLINQ
Presentation slides in PowerPoint by Mihai Budiu and Kannan Achan.