Zhong Wu, Qifa Ke, Jian Sun, and Heung-Yeung Shum
The state-of-the-art content based image retrieval systems
has been significantly advanced by the introduction of
SIFT features and the bag-of-words image representation.
Converting an image into a bag-of-words, however, involves
three non-trivial steps: feature detection, feature description,
and feature quantization. At each of these steps, there
is a significant amount of information lost, and the resulted
visual words are often not discriminative enough for large
scale image retrieval applications. In this paper, we propose
a novel multi-sample multi-tree approach to computing
the visual word codebook. By encoding more information
of the original image feature, our approach generates a
much more discriminative visual word codebook that is also
efficient in terms of both computation and space consumption,
without losing the original repeatability of the visual
features. We evaluate our approach using both a groundtruth
data set and a real-world large scale image database.
Our results show that a significant improvement in both precision
and recall can be achieved by using the codebook
derived from our approach.
|Published in||The 12th International Conference on Computer Vision (ICCV)|