NetWiser

The NetWiser project spans several areas of data center networks, from designing new scalable network architectures, understanding network failures and developing techniques for improving availability of services hosted in the cloud.
  1. Service impact of intra-dc and inter-dc network failures:
    A field study on understanding how failures at the intra-dc level (Top-of-Rack switches, Aggregation switches and Access Routers) and at the inter-dc level (long-haul WAN links) impact availability of online services, and deriving best practices to improve service availability. [SoCC 2013, SIGMETRICS 2013 (Extended Abstract)].
  2. Middlebox reliability analysis:
    Characterizing the reliability of middleboxes in datacenters such as load balancers, firewalls, intrusion detection and prevention systems, and VPNs, and analyzing their implications to improve middlebox reliability. [IMC 2013].
  3. NetSieve root-cause inference:
    Automated problem inference from network trouble tickets to uncover the 'big picture'of network problems and developing best-practices towards their fast and accurate resolution. [NSDI 2013].
  4. Understand network failures in data centers:
    The aim is to characterize failures of network devices in data centers by analyzing failure incidents and correlating them with network traffic, estimating impact of failures, and deriving implications for designing future network architectures. [SIGCOMM 2011].
  5. Design scalable and flexible data center network architectures
    VL2 is a new data center network architecture designed for hundreds of thousands of servers and built from commodity switches that enables high-bisection bandwidth between all communicating server pairs, agility in mapping any service to any server, and achieves graceful performance degradation under failures [SIGCOMM 2009, CACM 2011].

Publications

  • When the Network Crumbles: An Empirical Study of Cloud Network Failures and their Impact on Services
    Rahul Potharaju and Navendu Jain
    To appear in ACM Symposium on Cloud Computing (SoCC '13), Santa Clara, CA.
    [PDF] [Bibtex] [Project Page]

  • Demystifying the Dark Side of the Middle: A Field Study of Middlebox Failures in Datacenters
    Rahul Potharaju and Navendu Jain
    To appear in Internet Measurement Conference (IMC '13), Barcelona, Spain.
    [PDF] [Bibtex] [Project Page]

  • Juggling the Jigsaw: Towards Automated Problem Inference from Network Trouble Tickets
    Rahul Potharaju, Navendu Jain and Cristina Nita-Rotaru
    Proceedings of the 10th USENIX Symposium on Network Design and Implementation (NSDI '13).
    [PDF] [Bibtex] [Project Page]

  • Understanding Network Failures in Data Centers: Measurement, Analysis, and Implications.
    Phillipa Gill, Navendu Jain, and Nachi Nagappan.
    Proceedings of the ACM Special Interest Group on Data Communications (SIGCOMM '11), Toronta, Canada, August 2011.
    [PDF] [Bibtex] [Project Page]

  • VL2: A Scalable and Flexible Data Center Network.
    Albert Greenberg, James Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, Dave Maltz, Praveen Patel, and Sudipta Sengupta.
    Communications of the ACM (CACM '11), Research highlights.
    (A conference version of this work appeared in the ACM Special Interest Group on Data Communications (SIGCOMM '09), Barcelona, Spain, August 2009.)
    [PDF] [PS] [Bibtex] [Project Page]

People

  • Interns: Rahul Potharaju (Purdue University), Rui Miao (University of Southern California), Phillipa Gill (University of Toronto).
  • Albert Greenberg
  • Nachiappan Nagappan

Contact

E-mail: navendu [AT] microsoft.com
 > People > Navendu Jain > NetWiser