Xian-Sheng Hua, Linjun Yang, Ming Ye, Kuansan Wang, Yong Rui, and Jin Li
In this paper, we introduce a new large-scale click-based image dataset, Clickture, to advance researches and real-word applications in the area of multimedia understanding and search. The dataset consists of 32.9 million distinct images, and each image is associated with one or more labels of the 18.7 million distinct textual queries. The correspondences between an image and a label (query) are determined by whether the image was searched and clicked by users under the query in a commercial search engine. Different from existing human-labeled datasets, Clickture covers most of the search interests of real-life users, in terms of both images and semantic labels. Moreover, click data is accumulating rapidly every day, thus it is feasible to extend Clickture with more images, labels and correspondences. Clickture can be regarded as a “bridge” to connect image visual content with semantic textual descriptions as well as users’ search intents. We argue using this huge and “unlimited” click data is a promising direction to tackle the challenges in bridging semantic gap and intention gap in multimedia search and related applications. We will introduce the uniqueness and importance of Clickture, its construction methodology, data properties, as well as exemplary researches in the paper.