Parikshit Gopalan and Jaikumar Radhakrishnan
Given a data stream of length $n$ over an alphabet $[m]$ where $n > m$, we consider the problem of finding a duplicate in a single pass. We give a randomized algorithm for this problem that uses $O((\log m)^3)$ space. This answers a question of Muthukrishnan and Tarui, who asked if this problem could be solved using sub-linear space and one pass over the input. Our main tool is an Isolation Lemma that reduces this problem to the task of detecting and identifying a Dictatorial variable in a Boolean halfspace.
Publisher The British Computer Society