Project DARP: Dense Arrays of Resource Pools

A recent trend in data center hardware design is to leverage resource integration to reduce the Total Cost of Ownership (TCO). Over the past few years, several hardware vendors have released platforms called Fabric Compute Systems targeted at commodity data centers (e.g., AMD SeaMicro platform, Boston Viridis, HP Moonshot). These platforms typically use Systems on Chip (SoC) that embed I/O controllers (often including a network switch) along with the CPU on the same silicon chip (e.g., Calxeda). Fabric Compute Systems form compact (up to rack-scale) clusters of SoCs connected to each other via tracks on Printed Circuit Board (PCB). They achieve an order of magnitude higher computational density compared to commodity clusters and have a more cost-effective networking by removing dedicated switches and cables. The purpose of this project is to explore the design space of Fabric Compute Systems to make them efficient rack-scale building blocks for future data centers.

What are the perspectives? The goal is to achieve resource disaggregation at the rack-level, pooling resources and offering the right amount of dedicated resources to each workload (see Intel Rack Scale Architecture). Together with reduced energy footprint and integrated power management at the SoC level, this organization has the potential to dramatically improve data center resource efficiency.

Addressed challenges:

What network topology is the best suited for a given workload? Fabric Compute Systems have non-traditional in-rack networks where each SoC uses its embedded switch for traffic forwarding. The network topology has a great impact on the performance of the system. In that context, an exciting perspective is the ability to derive a topology from a workload to achieve near-optimal network utilization. We are also considering a wide variety of existing topologies (e.g., 3D-torus, random and small-world networks) to determine their ability to serve widespread data center workloads.

What is the most efficient placement for a shared resource within a complex topology? We primarily target storage as a key resource at the rack-level. The goal is to quantify how much bandwidth to storage is optimal for a given workload and how to distribute storage within the network topology. The placement of the resource is considered together with routing mechanisms that enable each server to access the remote resource.

How to partition and schedule workloads across multiple fabric computing racks? Such hardware architectures introduce new challenges compared to traditional data centers. First, we need to explicitly consider in-rack network as a resource when taking a job placement decision because network topologies can vary significantly from rack to rack. Second, the amount of ingress/egress bandwidth per SoC is reduced compared to traditional servers, so cross-rack communication should be avoided where possible.

Contributors:

Sergey Legtchenko, Ant Rowstron, Paolo Costa, Dushyanth Narayanan

 

Contact Terms Trademarks Privacy and Cookies Code of Conduct © Microsoft Corporation. All rights reserved.Microsoft