GrayWulf: Scalable Software Architecture for Data Intensive Computing

Yogesh Simmhan, Maria Nieto-Santisteban, Roger Barga, Tamas Budavari, Laszlo Dobos, Nolan Li, Michael Shipway, Alexander S. Szalay, Ani Thakar, Jan Vandenberg, Alainna Wonders, Sue Werner, Richard Wilton, Dan Fay, Michael Thomassy, Catharine van Ingen, Jim Heasley, and Conrad Holmberg

Abstract

Big data presents new challenges to both cluster infrastructure software and parallel application design. We present a set of software services and design principles for data intensive computing with petabyte data sets, named GrayWulf†. These services are intended for deployment on a cluster of commodity servers similar to the well-known Beowulf clusters. We use the Pan-STARRS system currently under development as an example of the architecture and principles in action.

Details

Publication typeInproceedings
Published inHawaii International Conference on System Sciences (HICSS)
PublisherIEEE Computer Society

Previous versions

Yogesh Simmhan, Roger Barga, Catharine van Ingen, Maria Nieto-Santisteban, Laszlo Dobos, Nolan Li, Michael Shipway, Alexander S. Szalay, Sue Werner, and Jim Heasley. GrayWulf: Scalable Software Architecture for Data Intensive Computing, Microsoft Research, 15 September 2008.

> Publications > GrayWulf: Scalable Software Architecture for Data Intensive Computing