Streaming Algorithms via Precision Sampling

Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak

2011

## Abstract

A technique introduced by Indyk and Woodruff (STOC 2005) has inspired several recent advances in data-stream algorithms. We show that a number of these results follow easily from the application of a single probabilistic method called Precision Sampling. Using this method, we obtain simple data-stream algorithms that maintain a randomized sketch of an input vector *x=(x*_{1},x_{2},…,x_{n}), which is useful for the following applications:

- Estimating the
*F*_{k}-moment of *x*, for *k>2*.
- Estimating the
*ℓ*_{p}-norm of *x*, for *p∈[1,2]*, with small update time.
- Estimating cascaded norms
*ℓ*_{p}(ℓ_{q}) for all *p,q>0*.
*ℓ*_{1} sampling, where the goal is to produce an element *i* with probability (approximately) *|x*_{i}|/\|x\|_{1}. It extends to similarly defined *ℓ*_{p}-sampling, for *p∈ [1,2]*.

For all these applications the algorithm is essentially the same: scale the vector *x* entry-wise by a well-chosen random vector, and run a heavy-hitter estimation algorithm on the resulting vector. Our sketch is a linear function of *x*, thereby allowing general updates to the vector *x*.

Precision Sampling itself addresses the problem of estimating a sum *sum*_{i=1}^{n} a_{i} from weak estimates of each real *a*_{i}∈[0,1]. More precisely, the estimator first chooses a desired precision *u*_{i}∈(0,1] for each *i∈[n]*, and then it receives an estimate of every *a*_{i} within additive *u*_{i}. Its goal is to provide a good approximation to *sum a*_{i} while keeping a tab on the “approximation cost” *sum*_{i} (1/u_{i}). Here we refine previous work (Andoni, Krauthgamer, and Onak, FOCS 2010) which shows that as long as *sum a*_{i}=Ω(1), a good multiplicative approximation can be achieved using total precision of only *O(nlog n)*.

## Details