Complementary Hashing for Approximate Nearest Neighbor Search

Hao Xu1, Jingdong Wang2, Zhu Li3, Gang Zeng4, Shipeng Li2, Nenghai Yu1
1University of Science and Technology of China  2Microsoft Research Asia
3Huawei Technology  4Peking University
 
 

Abstract

Recently, hashing based Approximate Nearest Neighbor (ANN) techniques have been attracting lots of attention in computer vision. The data-dependent hashing methods, e.g., Spectral Hashing, expects better performance than the data-blind counterparts, e.g., Locality Sensitive Hashing (LSH). However, most data-dependent hashing methods only employ a single hash table. When higher recall is desired, they have to retrieve exponentially growing number of hash buckets around the bucket containing the query, which may drag down the precision rapidly. In this paper, we propose a so-called complementary hashing approach, which is able to balance the precision and recall in a more effective way. The key idea is to employ multiple complementary hash tables, which are learned sequentially in a boosting manner, so that, given a query, its true nearest neighbors missed from the active bucket of one hash table are more likely to be found in the active bucket of the next hash table. Compared with LSH that also can exploit multiple hash tables, our approach is more effective to find true NNs, thanks to the complementarity property of the hash tables from our approach. Experimental results on large scale ANN search show that the proposed method significantly improves the performance and outperforms the state-of-the-art.

HashIll
Illustration of the differences using (a) single hash table (e.g., spectral hashing), (b) multiple random hash tables (LSH), and (c) multiple learned hash tables (our approach). Suppose the true neighbors of the query (the red bullet) distribute in the shaded area. To cover the shaded region, (a) requires a big Hamming ball, (b) needs many small balls, while fewer small balls are enough shown in (c). We illustrate the inverse projection of the Hamming ball as the ball in the data space centered at the star, and the balls with R= 0corresponds to the active hash buckets hit by the query in different hash tables.

Download

| PDF |

Results

Performance Comparison 1
Comparison1
Comparison of the performance using Hamming ranking on the 20K LabelMe dataset. (a), (b) and (c) are the performances for the hash codes of 16, 24 and 32 bits respectively.
 
Performance Comparison 2
Comparison2
Comparison of the performance using Hamming ranking on the 1M SIFT dataset. (a), (b) and (c) are the performances for the hash codes of 16, 24 and 32 bits respectively.
 
Performance Comparison 3
Comparison3
Comparison of LSH and CH in the cases that different numbers of hash tables are employed.
 
 
©Copyright Jingdong Wang 2012
HTML Hit Counters