Share on Facebook Tweet on Twitter Share on LinkedIn Share by email

Background, Motivation and Basic Idea

The key hinder factor of computer vision research is the semantic gap between existing low-level visual features and high-level semantic concepts. Traditional image auto-annotation approaches attempted to directly map visual features to textual keywords. However, since these two types of features are heterogeneous, the intrinsic mapping function is totally in the dark.

Arista adopts a search-to-annotation strategy. Its basic idea is that close-similar images share similar semantics. Leveraging a large-scale partly annotated image database, it annotates an image by first search for a number of close-similar images in the content-based image retrieval framework, and then mine relevant terms/phrases from their surrounding texts. In this way, Arista avoids the semantic gap problem to a certain extent by measuring similarities in homogeneous feature spaces( i.e. image against image and text against text).




Arista targets at a practical image annotation engine which is able to handle any images of popular concepts (i.e. we can find at least N images of a concept). To achieve this goal, the following challenges should be addressed:

- How to bridge the semantic gap

- How to select annotation vocabulary and find more meaningful keywords

- How to define a rejection scheme

- How to enable real-time image search on a web-scale image dataset


Research Roadmap

Starting from 2006, we have conducted a series of research work on Arista, addressing the challenges above. The timeline below illustrates our attempts:

towards the feasibility of Arista (See "Query by Image+Keyword", "Query by Image Only"). Particularly, in [Wang, CVPR'06], we first proposed Arista as a non-parametric image auto-annotation approach which annotates an image in a two-step fashion, i.e. search and then mining. We implemented a prototype system over 2.4 million images, called AnnoSearch, which requires an additional keyword to speed up the image search process and reduce the semantic gap. Soon after that, we developed the second system which only requires an image query and is capable of searching from 2.4 million images in real time [Li, MM’06].

- towards a better text mining approach, including annotation refinement, annotation vocabulary selection, and annotation rejection (See "Improve Mining" & [Lu, CVPR'07]). It could be advantageous if a dedicated approach could refine the annotation results. Therefore, [Wang, MM'06] proposed an image annotation refinement approach using random walk with restarts. The authors further investigated the same problem in [Wang, CVPR'07] and [Wang, CIVR'08] in which Markov processes were proposed and further improved the annotation performance.

- towards bridging the semantic gap (See "Improve Search"). This line of research investigates ways of improving image search results so that semantically similar images can be retrieved which will definitely improve annotation precision. [Lu, CVPR'07] first divide-and-conquered this problem by developing a lexicon of high-level image concepts with small semantic gaps which is helpful for people to focus for data collection, annotation and modeling. [Wang, SIGIR'08], on the other hand, studied better distance measure on images by reinforcing textual and visual features.

- towards the value of a real web-scale dataset (See "Scale-up Database to 2B Images"). Such nonparametric ways of image annotation as Arista has attracted many research efforts. People have realized the value of a large-scale partial-labeled image dataset on image annotation. Collaborating with the Bing Multimedia Search team, we enabled real-time image annotation on 2 Billion images [Wang, CVPR'10], which is the first time a real web-scale image dataset  was used for image annotation. This work attempted to answer following questions: 1) the coverage of image concepts by using the search-to-annotation techniques; 2) the annotation performance; and 3) the maximum scale of dataset so that its further expansion does not necessarily mean better annotation precision. However, due to system issue, this work is based on near-duplicate detection techniques, which means that it only partily answered the questions above and leaves the room for our future work. 



· [Wang, CVPR'10] Xin-Jing Wang, Lei Zhang, Ming Liu, Yi Li, Wei-Ying Ma. ARISTA - Image Search to Annotation on Billions of Web Photos, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2010.

· [Lu, T-MM'09] Yijuan Lu, Jiemin Liu, and Qi Tian, and Lei Zhang, LCSS: Lexica of High-Level Concepts with Small Semantic Gaps, accepted by IEEE Transaction on Multimedia, 2009

· [Wang, T-PAMI'08] Xin-Jing Wang, Lei Zhang, Xirong Li, Wei-Ying Ma, Annotating Images by Mining Image Search Results, IEEE Trans. Pattern Analysis and Machine Intelligence Special Issue (TPAMI), 2008.

· [Wang, SIGIR'08] Changhu Wang, Lei Zhang, Hong-Jiang Zhang. Learning to Reduce the Semantic Gap in Web Image Retrieval and Annotation, in Proc. of the 31st Annual International ACM SIGIR Conference on Research & Development on Information Retrieval (SIGIR), Singapore, July 2008.

· [Wang, CIVR'08] Changhu Wang, Lei Zhang, Hong-Jiang Zhang. Scalable Markov Model-Based Image Annotation, in Proc. of ACM International Conference on Image and Video Retrieval (CIVR), Niagara Falls, Canada, July 2008.

· [Wang, MMSJ'08] Changhu Wang, Feng Jing, Lei Zhang, Hong-Jiang Zhang. Scalabel Search-based Image Annotation, in Multimedia Systems, June 14, 2008. ISSN: 0942-4962 (Print) 1432-1882 (Online).

· [Lu, CVPR'07] Yijuan Lu, Lei Zhang, Qi Tian, Wei-Ying Ma, What Are the High-Level Concepts with Small Semantic Gaps? in Proc. International Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, USA, June, 2007

· [Li, MM'07] Xirong Li, Xin-Jing Wang, Changhu Wang, Lei Zhang, SBIA: Search-based Image Annotation by Leveraging Web-Scale Images (Demo), in Proc. ACM Multimedia, Augsburg, Germany, September, 2007.

· [Wang, CVPR'07] Changhu Wang, Feng Jing, Lei Zhang, and Hong-Jiang Zhang, Content-Based Image Annotation Refinement, in Proc. International Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, USA, June, 2007.

· [Wang, MM'06] Changhu Wang, Feng Jing, Lei Zhang, Hong-Jiang Zhang. Image Annotation Refinement using Random Walk with Restarts, in Proc. of ACM International Conference on Multimedia (ACM MM), Santa Barbara, USA, October 2006.

· [Li, MM'06] Xirong Li, Le Chen, Lei Zhang, Fuzong Lin, Wei-Ying Ma, Image Annotation by Large-Scale Content-based Image Retrieval, in Proc. ACM Multimedia, Santa Barbara, USA, October, 2006.

· [Wang, MIR'06] Changhu Wang, Feng Jing, Lei Zhang, Hong-Jiang Zhang, Scalable Search-Based Image Annotation of Personal Images, in Proc. ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR), 2006

· [Wang, CVPR'06] Xin-Jing Wang, Lei Zhang, Feng Jing, Wei-Ying Ma, AnnoSearch: Image Auto-Annotation by Search, in Proc. International Conference on Computer Vision and Pattern Recognition (CVPR), New York, USA, June, 2006

· [Wang, WWW'06] Xin-Jing Wang, Lei Zhang, Feng Jing, Wei-Ying Ma, Image Annotation Using Search and Mining Technologies, in Proc. The 15th International World Wide Web Conference (WWW), Edinburgh, Scotland, May, 2006 (Best Poster Award)

· [Zhang, MM'06] Lei Zhang, Le Chen, Feng Jing, Defeng Deng, Wei-Ying Ma, EnjoyPhoto: A Vertical Image Search Engine for Enjoying High-Quality Photos, in Proc. ACM Multimedia, Santa Barbara, USA, October, 2006