Tao Mei, Lin-Xie Tang, Jinhui Tang, and Xian-Sheng Hua
The ever increasing volume of video content on the Web has created profound challenges for developing efficient indexing and search techniques to manage video data. Conventional techniques such as video compression and summarization strive for the two commonly conflicting goals of low storage and high visual and semantic fidelity. With the goal of balancing both video compression and summarization, this paper presents a novel approach, called "Near-Lossless Semantic Summarization" (NLSS), to summarize a video stream with the least high-level semantic information loss by using an extremely small piece of metadata. The summary consists of compressed image and audio streams, as well as the metadata for temporal structure and motion information. Although at a very low compression rate (around 1/40 of H.264 baseline, where traditional compression techniques can hardly preserve an acceptable visual fidelity), the proposed NLSS still can be applied to many video-oriented tasks, such as visualization, indexing and browsing, duplicate detection, concept detection, and so on. We evaluate the NLSS on TRECVID and other video collections, and demonstrate that it is a powerful tool for significantly reducing storage consumption, while keeping high-level semantic fidelity.
In ACM Trans. Multimedia Computing Communications and Applications