Complementary Hashing for Approximate Nearest Neighbor Search 
Hao Xu^{1}, Jingdong Wang^{2}, Zhu Li^{3}, Gang Zeng^{4}, Shipeng Li^{2}, Nenghai Yu^{1} 
^{1}University of Science and Technology of China ^{2}Microsoft Research Asia
^{3}Huawei Technology ^{4}Peking University 


Abstract
Recently, hashing based Approximate Nearest Neighbor
(ANN) techniques have been attracting lots of attention in
computer vision. The datadependent hashing methods,
e.g., Spectral Hashing, expects better performance than
the datablind counterparts, e.g., Locality Sensitive Hashing (LSH). However, most datadependent hashing methods
only employ a single hash table. When higher recall is desired, they have to retrieve exponentially growing number of
hash buckets around the bucket containing the query, which
may drag down the precision rapidly. In this paper, we propose a socalled complementary hashing approach, which
is able to balance the precision and recall in a more effective way. The key idea is to employ multiple complementary
hash tables, which are learned sequentially in a boosting
manner, so that, given a query, its true nearest neighbors
missed from the active bucket of one hash table are more
likely to be found in the active bucket of the next hash table.
Compared with LSH that also can exploit multiple hash tables, our approach is more effective to find true NNs, thanks
to the complementarity property of the hash tables from our
approach. Experimental results on large scale ANN search
show that the proposed method significantly improves the
performance and outperforms the stateoftheart.


Illustration of the differences using (a) single hash table (e.g., spectral hashing), (b) multiple random hash tables (LSH), and (c)
multiple learned hash tables (our approach). Suppose the true neighbors of the query (the red bullet) distribute in the shaded area. To
cover the shaded region, (a) requires a big Hamming ball, (b) needs many small balls, while fewer small balls are enough shown in (c). We
illustrate the inverse projection of the Hamming ball as the ball in the data space centered at the star, and the balls with R= 0corresponds
to the active hash buckets hit by the query in different hash tables. 
