Venkata N. Padmanabhan, Sriram Ramabhadran, and Jitendra Padhye
We present a client-based characterization of end-to-end Internet faults. Unlike prior studies of Internet faults that have focused on probing routers using tools such as traceroute and/or listening in on routing protocol messages, we consider a novel approach based on having clients passively observe end-to-end transactions that they are involved in. Observations from multiple clients are combined to arrive at a more complete picture of the extent and the likely cause of faults. We present the characterization of real faults observed by a heterogeneous collection of 134 client hosts, as they repeatedly downloaded content from a diverse set of 80 web sites, over a period of one month. We find a wide range in the failure rate of these transactions (e.g., 100% failure rate for certain client-server pairs). About 30% of transaction failures are due to DNS problems, with most of the rest being due to the inability of the client to be able to establish a TCP connection to the server. Also, by correlating failure observations across clients and servers, we find that client-side problems account for the overwhelming majority of DNS lookup failures whereas server-side problems are the dominant cause of TCP connection failures. We believe that our findings suggest the promise of a novel approach to diagnosing end-to-end Internet faults based on leveraging the collective experience of a diverse set of end-hosts to overcome the opacity of the network. We briefly discuss the key challenges in realizing such a system.