Optimal Moment Estimation in Data Streams

We close the problem of understanding the space complexity of pth moment estimation in data streams for 0 p = 2 by giving the first optimal upper and lower bounds. In this problem, a high-dimensional vector receives a long sequence of coordinate-wise updates, and we must output a low relative-error approximation of the pth moment at the end of the stream while keeping a memory footprint exponentially smaller than the vector length. This problem has applications in network traffic monitoring, databases, and simply computing distance measures across massive datasets.

We obtain the upper bound by showing an invariance principle for sums of boundedly independent p-stable random variables, which may be of independent interest. Our main ingredient in this proof is the introduction of an approximation theoretic tool we dub “FT-mollification”, which has since found other applications in agnostic learning, pseudorandomness, and approximation theory. We obtain the lower bound by showing a direct sum property for the one way communication complexity of the GapHamming problem.

Joint work with Daniel M. Kane (Harvard) and David P. Woodruff (IBM Almaden)

Speaker Details

Jelani Nelson is a doctoral student at MIT, in the Theory of Computation group at CSAIL. He received his S.B. (2005) and M.Eng. (2006) also from MIT. His research interests include algorithms and applied probability in general, though mostly he has focused specifically on algorithms for massive data (sketching, streaming, and external-memory algorithms) and pseudorandomness.

Date:
Speakers:
Jelani Nelson
Affiliation:
MIT
    • Portrait of Jeff Running

      Jeff Running