GrayWulf: Scalable Software Architecture for Data Intensive Computing

Big data presents new challenges to both cluster infrastructure software and parallel application design. We present a set of software services and design principles for data intensive computing with petabyte data sets, named GrayWulf†. These services are intended for deployment on a cluster of commodity servers similar to the well-known Beowulf clusters. We use the Pan-STARRS system currently under development as an example of the architecture and principles in action.

GrayWulf_Software_FINAL.docx
Word document

Publisher  Microsoft Research
© 2009 Microsoft Corporation. All rights reserved.

Details

TypeTechReport
NumberMSR-TR-2008-186
Share
Share this page on Facebook
Share this page on Twitter
Share this page on LinkedIn
E-mail this page
RSS feeds
> Publications > GrayWulf: Scalable Software Architecture for Data Intensive Computing