High Performance Computing: Crays, Clusters, and Centers. What Next?

After 50 years of building high performance scientific computers, two major architectures exist: (1) clusters of “Cray-style” vector supercomputers; (2) clusters of scalar uni- and multi-processors. Clusters are in transition from (a) massively parallel computers and clusters running proprietary software to (b) proprietary clusters running standard software, and (c) do-it-yourself Beowulf clusters built from commodity hardware and software. In 2001, only five years after its introduction, Beowulf has mobilized a community around a standard architecture and tools. Beowulf’s economics and sociology are poised to kill off the other two architectural lines – and will likely affect traditional super-computer centers as well. Peer-to-peer and Grid communities provide significant advantages for embarrassingly parallel problems and sharing vast numbers of files. The Computational Grid can federate systems into supercomputers far beyond the power of any current computing center. The centers will become super-data and super-application centers. While these trends make high-performance computing much less expensive and much more accessible, there is a dark side. Clusters perform poorly on applications that require large shared memory. Although there is vibrant computer architecture activity on microprocessors and on high-end cellular architectures, we appear to be entering an era of super-computing mono-culture. Investing in next generation software and hardware supercomputer architecture is essential to improve the efficiency and efficacy of systems. This paper has been submitted for publication to the Communications of the ACM . Copyright may be transferred without further notice and the publisher may then post the accepted version. A version of this article appears at http://research.microsoft.com/pubs/