Wenbin Tang, Rui Cai, Zhiwei Li, and Lei Zhang
28 November 2011
In this paper, we study the problem of visual object retrieval by introducing a dictionary of contextual synonyms to narrow down the semantic gap in visual word quantization. The basic idea is to expand a visual word in the query image with its synonyms to boost the retrieval recall. Unlike the existing work such as soft-quantization, which only focuses on the Euclidean (l2) distance in descriptor space, we utilize the visual words which are more likely to describe visual objects with the same semantic meaning by identifying the words with similar contextual distributions (i.e. contextual synonyms). We describe the contextual distribution of a visual word using the statistics of both co-occurrence and spatial information averaged over all the image patches having this visual word, and propose an efficient system implementation to construct the contextual synonym dictionary for a large visual vocabulary. The whole construction process is unsupervised and the synonym dictionary can be naturally integrated into a standard bag-of-feature image retrieval system. Experimental results on several benchmark datasets are quite promising. The contextual synonym dictionary-based expansion consistently outperforms the l2 distance-based soft-quantization, and advances the state-of-the-art performance remarkably.
In Proceeding of the 19th ACM International Conference on Multimedia (MM 2011)
Publisher Association for Computing Machinery, Inc.
Copyright © 2011 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or firstname.lastname@example.org. The definitive version of this paper can be found at ACM’s Digital Library --http://www.acm.org/dl/.