NetHealth

To detect, infer, diagnose, and recover from faults in enterprise wired and wireless networks.NetHealth: is a network management research program in which end-hosts cooperatively detect, diagnose, and recover from network faults. Unlike existing products we take a end-host centric approach to gathering, aggregating, and analyzing data at all layers of the networking stack for determining the root cause of the problems. NetHealth includes several on-going projects in the wireless and wired space

People
Victor Bahl
Victor Bahl

Ranveer Chandra
Ranveer Chandra

Srikanth Kandula
Srikanth Kandula

Dave Maltz
Dave Maltz

Jitu Padhye
Jitu Padhye

Ming Zhang
Ming Zhang

Overview

Networks are being deployed extensively in large corporations, small offices, and homes. However, a significant number of ``pain points'' remain for end-users and network administrators. To resolve complaints quickly and efficiently, network administrators need tools that can assist them in detecting, isolating, diagnosing, and correcting faults. Furthermore, such tools should also detect network security breaches, possibly caused by innocent employees. The NetHealth project is about detecting, infering, diagnosing, and recovering from user perceived performance problems in enterprise networks.

Existing products do a reasonable job of presenting statistical data from the network. However, they do not do a comprehensive job of gathering and analyzing the data to establish the root cause of the problem. Furthermore, on the wireless side, most products gather data from the Access Points (APs) only and neglect the client-side view of the network. Some products that monitor the network from the client's perspective require hardware sensors, which can be expensive to deploy and maintain. Also, current solutions do not provide any support for disconnected clients even though these are the ones that need the most help. On the wired side, a number of researchers have come up with solutions for diagnosing problems over WANs; however, most of those approaches are not integrated to perform end-to-end inference and diagnostics.

Under the NetHealth umbrella, we are building algorithms and tools that

    • allow generalist operators to diagnose end-to-end performance as “seen” by users
    • produce near real-time and historical-analysis reports of end-to-end performance problems with networked services and components
    • prioritize and raise alerts based on impact analysis on users from performance glitches/problems
    • automatically resolve the problem or offer meaningful resolution strategies
    • provide detailed analysis of wireless failures for mobile devices
    • provide snapshots of the “health” of network elements and services
    • compliment existing detailed networked diagnosis technologies

In contrast to traditional network-based and bolt-on approaches, NetHealth leverages clients and servers. NetHealth agents on the end systems are positioned to harvest available application data, and infer application-level dependencies, rather than reverse this information out from the network or from summarized logs and alerts from computing and network elements, and associated management systems. As a result, the NetHealth approach is well-suited for effective problem location and resolution, and for bringing together the intelligence needed to support meaningful resilience and self-healing, self-* capabilities.

Sub-Projects

  • Sherlock - Enterprise network management via analysis of network dependencies
  • Orion - Dependency extraction in enterprise networks
  • DAIR - Enterprise wireless LAN management via Dense Array of Inexpensive Radios
  • ELDA (SureMail) - Notification system when email losses are detected
  • NetProfiler - Cooperative Network Monitoring & Diagnosis

Brainstorming Events

All talks, videos and presentation decks are avaialble on event's web site.

Publications

    2010

    2009

    2008

    2007

    2006

    2005

    2004

    2003

    Press