The NetWiser project spans several areas of data center networks, from designing new scalable network architectures, understanding network failures
and developing techniques for improving availability of services
hosted in the cloud.
- Service impact of intra-dc and inter-dc network failures:
A field study on understanding how failures at the intra-dc level (Top-of-Rack switches, Aggregation switches and Access Routers) and at the inter-dc level (long-haul WAN links) impact availability of online services, and deriving best practices to improve service availability. [SoCC 2013, SIGMETRICS 2013 (Extended Abstract)].
- Middlebox reliability analysis:
Characterizing the reliability of middleboxes in datacenters such as load balancers, firewalls, intrusion detection and prevention systems, and VPNs, and analyzing their implications to improve middlebox reliability. [IMC 2013].
- NetSieve root-cause inference:
Automated problem inference from network trouble tickets to uncover the 'big picture'of network problems and developing best-practices towards their fast and accurate resolution.
- Understand network failures in data centers:
The aim is to characterize failures of network devices
in data centers by analyzing
failure incidents and correlating them with network traffic, estimating impact of failures, and deriving implications for designing future network architectures. [SIGCOMM 2011].
- Design scalable and flexible data center network architectures
VL2 is a new data center network architecture designed for hundreds of thousands of servers and built from commodity switches that enables high-bisection bandwidth between all communicating server pairs, agility in mapping any service to any server, and achieves graceful performance degradation under failures [SIGCOMM 2009, CACM 2011].
When the Network Crumbles: An Empirical Study of Cloud Network Failures and their Impact on Services
Rahul Potharaju and Navendu Jain
To appear in ACM Symposium on Cloud Computing (SoCC '13), Santa Clara, CA.
Demystifying the Dark Side of the Middle: A Field Study of Middlebox Failures in Datacenters
Rahul Potharaju and Navendu Jain
To appear in Internet Measurement Conference (IMC '13), Barcelona, Spain.
Juggling the Jigsaw: Towards Automated Problem Inference from Network Trouble Tickets
Rahul Potharaju, Navendu Jain and Cristina Nita-Rotaru
Proceedings of the 10th USENIX Symposium on Network Design and Implementation (NSDI '13).
Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications.
Phillipa Gill, Navendu Jain, and Nachi Nagappan.
Proceedings of the ACM Special Interest Group on Data Communications (SIGCOMM '11), Toronta, Canada, August 2011.
VL2: A Scalable and Flexible Data Center Network.
Albert Greenberg, James Hamilton, Navendu Jain, Srikanth Kandula, Changhoon
Kim, Parantap Lahiri, Dave Maltz, Praveen Patel, and Sudipta Sengupta.
Communications of the ACM (CACM '11), Research highlights.
(A conference version of this work appeared in the ACM Special Interest Group on Data Communications (SIGCOMM '09), Barcelona, Spain, August 2009.)
- Interns: Rahul Potharaju (Purdue University), Rui Miao (University of Southern California), Phillipa Gill (University of Toronto).
- Albert Greenberg
- Nachiappan Nagappan
E-mail: navendu [AT] microsoft.com