Complementary Hashing for Approximate Nearest Neighbor Search |
| Hao Xu1, Jingdong Wang2, Zhu Li3, Gang Zeng4, Shipeng Li2, Nenghai Yu1 |
1University of Science and Technology of China 2Microsoft Research Asia
3Huawei Technology 4Peking University |
| |
| |
Abstract
Recently, hashing based Approximate Nearest Neighbor
(ANN) techniques have been attracting lots of attention in
computer vision. The data-dependent hashing methods,
e.g., Spectral Hashing, expects better performance than
the data-blind counterparts, e.g., Locality Sensitive Hashing (LSH). However, most data-dependent hashing methods
only employ a single hash table. When higher recall is desired, they have to retrieve exponentially growing number of
hash buckets around the bucket containing the query, which
may drag down the precision rapidly. In this paper, we propose a so-called complementary hashing approach, which
is able to balance the precision and recall in a more effective way. The key idea is to employ multiple complementary
hash tables, which are learned sequentially in a boosting
manner, so that, given a query, its true nearest neighbors
missed from the active bucket of one hash table are more
likely to be found in the active bucket of the next hash table.
Compared with LSH that also can exploit multiple hash tables, our approach is more effective to find true NNs, thanks
to the complementarity property of the hash tables from our
approach. Experimental results on large scale ANN search
show that the proposed method significantly improves the
performance and outperforms the state-of-the-art.
|

|
| Illustration of the differences using (a) single hash table (e.g., spectral hashing), (b) multiple random hash tables (LSH), and (c)
multiple learned hash tables (our approach). Suppose the true neighbors of the query (the red bullet) distribute in the shaded area. To
cover the shaded region, (a) requires a big Hamming ball, (b) needs many small balls, while fewer small balls are enough shown in (c). We
illustrate the inverse projection of the Hamming ball as the ball in the data space centered at the star, and the balls with R= 0corresponds
to the active hash buckets hit by the query in different hash tables. |
Download
| PDF |
|
Results
| Performance Comparison 1 |
 |
| Comparison of the performance using Hamming ranking on the 20K LabelMe dataset. (a), (b) and (c) are the performances for
the hash codes of 16, 24 and 32 bits respectively. |
|
| |
| Performance Comparison 2 |
 |
| Comparison of the performance using Hamming ranking on the 1M SIFT dataset. (a), (b) and (c) are the performances for the
hash codes of 16, 24 and 32 bits respectively. |
|
| |
| Performance Comparison 3 |
 |
| Comparison of LSH and CH in the
cases that different numbers of hash tables
are employed. |
|
| |
|
|
| |
| ©Copyright
Jingdong Wang 2012 |
 |
|