CamCubeOS: A Key-based Network Stack for 3D Torus Cluster Topologies

The 22nd ACM International Symposium on High Performance Parallel and Distributed Computing (HPDC'13) |

Published by ACM Press

Cluster fabric interconnects that use 3D torus topologies are increasingly being deployed in data center clusters. In our prior work, we demonstrated that by using these topologies and letting applications implement custom routing protocols and perform operations on path, it is possible to increase performance and simplify development. However, these benefits cannot be achieved using mainstream point-to-point networking stacks such as TCP/IP or MPI, which hide the underlying topology and do not allow the implementation of any in-network operations. In this paper we describe CamCubeOS, a novel key-based communication stack, purposely designed from scratch for 3D torus fabric interconnects. We note that many of the applications used in clusters are key-based. Therefore, we designed CamCubeOS to natively support key-based operations. We select a virtual topology that perfectly matches the underlying physical topology and we use the keyspace to expose the physical locality, thus avoiding the typical overhead incurred by overlay-based approaches. We report on our experience in building several applications on top of CamCubeOS and we evaluate their performance and feasibility using a prototype and large-scale simulations.