Jingjing Liu, Wei Lai, Xian-Sheng Hua, Yalou Huang, and Shipeng Li
This paper is concerned with the problem of multimodal fusion in video search. First, we employ an object-sensitive approach to query analysis to improve the baseline result of text-based video search. Then, we propose a PageRank-like graph-based approach to text-based search result re-ranking. To better exploit the underlying relationship between video shots, the proposed reranking scheme simultaneously leverages textual relevancy, semantic concept relevancy, and low-level-feature-based visual similarity. In this PageRank-like scheme, we construct a set of graphs with the video shots as vertexes, and the conceptual and visual similarity between video shots as “hyperlinks.” A modified topic-sensitive PageRank algorithm is then applied on these graphs to propagate the relevance scores through all related video shots. Experimental results verify the effectiveness of the graphbased propagation approach combined with the object-sensitive query analysis approach, which brings significant improvement to the baseline of text-based video search. Our experimental analysis also indicates that the proposed re-ranking method is highly generic and independent of different query classes, training data, and human interference.
|Publisher||ACM Multimedia 2007|