Sampling Biases in Network Path Measurements and What to Do About It

Internet Measurement Conference |

Published by Association for Computing Machinery, Inc.

Publication

We show that currently prevalent practices for network path measurements can produce inaccurate inferences because of sampling biases. The inferred mean path latency can be more than a factor of two off the true mean. We present the Broom toolkit that has three methods to correct for this bias. Broom places no burden on the measurement process itself and can be applied post hoc to any measured data set. Our evaluation finds that two of the methods are particularly effective. One of them estimates missing path samples by embedding the nodes in a low-dimensional coordinate space. For realistic sampling rates, the quality of its estimates for path latency approximates ideal, unbiased sampling. The other method is based on a view of network paths as being composed of source-specific, destination-specific, and shared components. It reduces bias for a wide range of path properties, such as latency, hop count and capacity. Applying Broom to data from a real measurement study leads to substantial changes in the resulting inferences. For some networks, the post-correction estimate is 30% higher than the original.