Sampling Biases in Network Path Measurements and What to Do About It
- Srikanth Kandula ,
- Ratul Mahajan
Internet Measurement Conference |
Published by Association for Computing Machinery, Inc.
We show that currently prevalent practices for network path measurements can produce inaccurate inferences because of sampling biases. The inferred mean path latency can be more than a factor of two off the true mean. We present the Broom toolkit that has three methods to correct for this bias. Broom places no burden on the measurement process itself and can be applied post hoc to any measured data set. Our evaluation finds that two of the methods are particularly effective. One of them estimates missing path samples by embedding the nodes in a low-dimensional coordinate space. For realistic sampling rates, the quality of its estimates for path latency approximates ideal, unbiased sampling. The other method is based on a view of network paths as being composed of source-specific, destination-specific, and shared components. It reduces bias for a wide range of path properties, such as latency, hop count and capacity. Applying Broom to data from a real measurement study leads to substantial changes in the resulting inferences. For some networks, the post-correction estimate is 30% higher than the original.
Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library --http://www.acm.org/dl/.