Query-driven iterated neighborhood graph search
for large scale indexing

Jingdong Wang, Shipeng Li
Microsoft Research Asia
 
 

Abstract

In this paper, we address the approximate nearest neighbor (ANN) search problem over large scale visual descriptors. We investigate a simple but very effective approach, neighborhood graph search, which constructs a neighborhood graph to index the data points and conducts a local search, expanding neighborhoods with a best-first manner, for ANN search. Our empirical analysis shows that neighborhood expansion is very efficient, with O(1) cost, for a new NN candidate location, and has high chances to locate true NNs and hence it usually performs well. However, it often gets sub-optimal solutions since local search only checks the neighborhood of the current solution, or conducts exhaustive and continuous neighborhood expansions to find better solutions, which deteriorates the query efficiency.
In this paper, we propose a query-driven iterated neighborhood graph search approach to improve the performance. We follow the iterated local search (ILS) strategy, widely-used in combinatorial optimization, to find a solution beyond a local optimum. We handle the key challenge in making neighborhood graph search adapt to ILS, Perturbation, which generates a new pivot to restart a local search. To this end, we present a criterion to check if the local search over a neighborhood graph arrives at the local solution. Moreover, we exploit the query and search history to design the perturbation scheme, resulting in a more effective search. The major benefit is avoiding unnecessary neighborhood expansions and hence more efficiently finding true NNs. Experimental results on large scale SIFT matching, similar image search, and shape retrieval with non-metric distance measures, show that our approach performs much better than previous state-of-the-art ANN search approaches.

GraphSearchAlgorithm

Download

| PDF |

Results

Performance Comparison 1
Comparison1
Shape search performance. The horizontal axis corresponds to the number of accessed models, and the vertical axis corresponds to the accuracy.
 
Performance Comparison 2
Comparison2
First column shows the performance comparison over (a) the Caltech 101 data set, (b) recognition bench-mark images; and the rest two columns show the performance comparison of searching for the most similar image in (a) and (c) and top 20 similar images in (b) and (d) over the tiny images.
 
 
©Copyright Jingdong Wang 2012
HTML Hit Counters