Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
GrayWulf: Scalable Clustered Architecture for Data Intensive Computing

Alexander S. Szalay, Gordon Bell, Jan Vandenberg, Alainna Wonders, Randal Burns, Dan Fay, Jim Heasley, Tony Hey, Maria Nieto-SantiSteban, Ani Thakar, Catharine van Ingen, and Richard Wilton


Data intensive computing presents a significant challenge for traditional supercomputing architectures that maximize FLOPS since CPU speed has surpassed IO capabilities of HPC systems and BeoWulf clusters. We present the architecture for a three tier commodity component cluster designed for a range of data intensive computations operating on petascale data sets named GrayWulf†. The design goal is a balanced system in terms of IO performance and memory size, according to Amdahl’s Laws. The hardware currently installed at JHU exceeds one petabyte of storage and has 0.5 bytes/sec of I/O and 1 byte of memory for each CPU cycle. The GrayWulf provides almost an order of magnitude better balance than existing systems. The paper covers its architecture and reference applications. The software design is presented in a companion paper. † The GrayWulf name pays tribute to Jim Gray who has been actively involved in the design principles.


Publication typeTechReport

Newer versions

Maria Nieto-Santisteban, Yogesh Simmhan, Roger Barga, Laszlo Dobos, Jim Heasley, Conrad Holmberg, Nolan Li, Michael Shipway, Alexander S. Szalay, Catharine van Ingen, and Sue Werner. Pan-STARRS: Learning to Ride the Data Tsunami, December 2008.

> Publications > GrayWulf: Scalable Clustered Architecture for Data Intensive Computing