Yogesh Simmhan, Roger Barga, Catharine van Ingen, Maria Nieto-Santisteban, Laszlo Dobos, Nolan Li, Michael Shipway, Alexander S. Szalay, Sue Werner, and Jim Heasley
15 September 2008
Big data presents new challenges to both cluster infrastructure software and parallel application design. We present a set of software services and design principles for data intensive computing with petabyte data sets, named GrayWulf†. These services are intended for deployment on a cluster of commodity servers similar to the well-known Beowulf clusters. We use the Pan-STARRS system currently under development as an example of the architecture and principles in action.
Publisher Microsoft Research
© 2009 Microsoft Corporation. All rights reserved.