Ting Yao, Tao Mei, Chong-Wah Ngo, and Shipeng Li
The problem of tagging is mostly considered from the perspectives of machine learning and data-driven philosophy. A fundamental issue that underlies the success of these approaches is the visual similarity, ranging from the nearest neighbor search to manifold learning, to identify similar instances of an example for tag completion. The need to searching for millions of visual examples in high-dimensional feature space, however, makes the task computationally expensive. Moreover, the results can suffer from robustness problem, when the underlying data, such as online videos, are rich of semantics and the similarity is difficult to be learnt from low-level features. This paper studies the exploration of user searching behavior through click-through data, which is largely available and freely accessible by search engines, for learning video relationship and applying the relationship for economic way of annotating online videos. We demonstrated that, by a simple approach using co-click statistics, promising results were obtained in contrast to feature-based similarity measurement.
Furthermore, considering the long tail effect that few videos dominate most clicks, a new method based on polynomial semantic indexing is proposed to learn a latent space for alleviating the sparsity problem of click-through data. The proposed approaches are then applied for three major tasks in tagging: tag assignment, ranking, and enrichment. On a bipartite graph constructed from click-through data with over 15 million queries and 20 million video URL clicks, we showed that annotation can be performed for free with competitive performance and minimum computing resource, representing a new and promising paradigm for video tagging in addition to machine learning and data-driven methodologies.