Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Finding duplicates in a data stream

Parikshit Gopalan and Jaikumar Radhakrishnan


Given a data stream of length n over an alphabet [m] where n > m, we consider the problem of finding a duplicate in a single pass. We give a randomized algorithm for this problem that uses O((log m)3) space. This answers a question of Muthukrishnan and Tarui, who asked if this problem could be solved using sub-linear space and one pass over the input. Our main tool is an Isolation Lemma that reduces this problem to the task of detecting and identifying a Dictatorial variable in a Boolean halfspace.


Publication typeInproceedings
Published inSODA'09
PublisherThe British Computer Society
> Publications > Finding duplicates in a data stream