Duet: Cloud Scale Load Balancing with Hardware and Software

Rohan Gandhi; Hongqiang Liu; Y. Charlie Hu; Guohan Lu; Jitu Padhye; Lihua Yuan; Ming Zhang

Duet: Cloud Scale Load Balancing with Hardware and Software

Rohan Gandhi ,
Hongqiang Liu ,
Y. Charlie Hu ,
Guohan Lu ,
Jitu Padhye ,
Lihua Yuan ,
Ming Zhang

SIGCOMM 2014 | August 2014

Published by ACM - Association for Computing Machinery

Publication

Download BibTex

Load balancing is a foundational function of datacenter infrastructures and is critical to the performance of online services hosted in datacenters. As the demand for cloud services grows, expensive and hard-to-scale dedicated hardware load balancers are being replaced with software load balancers that scale using a distributed data plane that runs on commodity servers. Software load balancers offer low cost, high availability and high flexibility, but suffer high latency and low capacity per load balancer, making them less than ideal for applications that demand either high throughput, or low latency or both. In this paper, we present D UET, which offers all the benefits of software load balancer, along with low latency and high availability – at next to no cost. We do this by exploiting a hitherto overlooked resource in the data center networks – the switches themselves. We show how to embed the load balancing functionality into existing hardware switches, thereby achieving organic scalability at no extra cost. For flexibility and high availability, D UET seamlessly integrates the switch-based load balancer with a small deployment of software load balancer. We enumerate and solve several architectural and algorithmic challenges involved in building such a hybrid load balancer. We evaluate D UET using a prototype implementation, as well as extensive simulations driven by traces from our production data centers. Our evaluation shows that D UET provides 10x more capacity than a software load balancer, at a fraction of a cost, while reducing latency by a factor of 10 or more, and is able to quickly adapt to network dynamics including failures.