Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, and shipeng li
28 October 2008
This paper presents Flickr distance, which is a novel measurement of the relationship between semantic concepts (objects, scenes) in visual domain. For each concept, a collection of images are obtained from Flickr, based on which the improved latent topic based visual language model is built to capture the visual characteristic of this concept. Then Flickr distance between different concepts is measured by the square root of Jensen-Shannon (JS) divergence between the corresponding visual language models. Comparing with WordNet, Flickr distance is able to handle far more concepts existing on the Web, and it can scale up with the increase of concept vocabularies. Comparing with Google distance, which is generated in textual domain, Flickr distance is more precise for visual domain concepts, as it captures the visual relationship between the concepts instead of their co-occurrence in text search results. Besides, unlike Google distance, Flickr distance satistices triangular inequality, which makes it a more reasonable distance metric. Both subjective user study and objective evaluation show that Flickr distance is more coherent to human perception than Google distance. We also design several application scenarios, such as concept clustering and image annotation, to demonstrate the effectiveness of this proposed distance in image related applications.
|Published in||Proceeding of ACM Multimedia 2008|
|Publisher||Association for Computing Machinery, Inc.|
Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or firstname.lastname@example.org. The definitive version of this paper can be found at ACM’s Digital Library --http://www.acm.org/dl/.