Multimedia Search and Mining (MSM) group focuses on a wide variety of multimedia-related research and projects, e.g., understanding, analysis, search, data mining, and applications. We are working on research problems in image understanding, video analytics, large scale visual (image and video) indexing and search, 3D reconstruction, and so on.
- Food Recognition
We study the problem of food image recognition via deep learning techniques. Our goal is to develop a robust service to recognize thousands of popular Asia and Western food. Several prototypes have been developed to support diverse applications. We are also developing a prototype called Im2Calories, to automatically calculate the calories and conduct nutrition analysis for a dish image.
We study the problem of image captioning, i.e., automatically describing an image by a sentence. This is a challenging problem, since different from other computer vision tasks such as image classiﬁcation and object detection, image captioning requires not only understanding the image, but also the knowledge of natural language. We formulate this problem as a multimodal translation task, and develop novel algorithms to solve this problem.
- Network Morphism
We propose a novel learning scheme called network morphism. It morphs a parent network into a child network, allowing fast knowledge transferring. The child network is able to achieve the performance of the parent network immediately, and its performance shall continue to improve as the training process goes on. The proposed scheme allows any network morphism in an expanding mode for arbitrary non-linear neurons, including depth, width, kernel size and subnet morphing operations.
- Photo Story
To be completed.
- Video and Language
To be completed.
- 3D Object Reconstruction and Recognition
We study the problem of 3D object reconstruction and recognition. For reconstruction, we aim at developing algorithms and systems to lower down the barrier of 3D reconstruction for common users. In this way, we can collect a world-class 3D object repository via leveraging crowdsourcing. For recognition, we aim at dealing with a large-scale task (e.g. identifying thousands of objects), and providing real-time performance.
- MindFinder: Finding Images by Sketching
Sketch-based image search is a well-known and difficult problem, in which little progress has been made in the past decade in developing a large-scale and practical sketch-based search engine. We have revisited this problem and developed a scalable solution to sketch-based image search. The MindFinder system has been built by indexing more than two million web images to enable efficient sketch-based image retrieval, and many creative applications can be expected to advance the state of the art.
- Mobile Video Search
Mobile video is quickly becoming a mass consumer phenomenon. More and more people are using their smartphones to search and browse video contents while on the move. This project is to develop an innovative instant mobile video search system through which users can discover videos by simply pointing their phones at a screen to capture a very few seconds of what they are watching.
- Multimedia Advertising
The ever increasing multimedia content on the Internet has become the primary source for more effective online advertising. Conventional advertising systems treat multimedia content as the same as general text, without considering automatically monetizing the rich content of the images and videos. This research direction will leverage content analysis and understanding to enable more effective and efficient advertising on multimedia content, whether on the Internet and mobile devices.
- Picto: A large scale visual indexing and recognition system
In this project, we focus on developing algorithms for large-scale image indexing and recognition. Our research covers low-level image features, middle level image representations, and indexing and ranking algorithms.
- Video Collage
Video Collage is a kind of synthesized image that enable users to quickly browse the video content. Given a video, Video Collage is able to select the most representative images from the video, extract salient regions of interest from these images, and seamlessly arrange ROI on a given canvas. Video Collage can be used for Windows Vista Explorer, Live Search Video, as well as MSN Soapbox.
- Bo Wu, Tao Mei, Wen-Huang Cheng, and Yongdong Zhang, Unfolding Temporal Dynamics: Predicting Social Media Popularity Using Multi-scale Temporal Decomposition, AAAI - Association for the Advancement of Artificial Intelligence, February 2016.
- Ting Yao, Tao Mei, and Chong-Wah Ngo, Learning Query and Image Similarities with Ranking Canonical Correlation Analysis, IEEE International Conference on Computer Vision, December 2015.
- Jianlong Fu, Yue Wu, Tao Mei, Jinqiao Wang, Hanqing Lu, and Yong Rui, Relaxing from Vocabulary: Robust Weakly-Supervised Deep Learning for Vocabulary-Free Image Tagging, IEEE International Conference on Computer Vision, December 2015.
- Xuyong Yang, Tao Mei, Ying-Qing Xu, Yong Rui, and Shipeng Li, Automatic Generation of Visual-Textual Presentation Layout, in ACM Transactions on Multimedia Computing Communications and Applications, ACM – Association for Computing Machinery, November 2015.
- Yalong Bai, Kuiyuan Yang, Wei Yu, Chang Xu, Wei-Ying Ma, and Tiejun Zhao, Automatic Image Dataset Construction from Click-through Logs Using Deep Neural Network, in ACM Multimedia, October 2015.
- Tiajun Xiao, Yichong Xu, Kuiyuan Yang, Jiaxing Zhang, Yuxin Peng, and Zheng Zhang, The Application of Two-level Attention Models in Deep Convolutional Neural Network for Fine-grained Image Classification, in CVPR, June 2015.
- Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, and Yong Rui, Jointly Modeling Embedding and Translation to Bridge Video and Language, no. MSR-TR-2015-92, June 2015.
- Yalong Bai, Wei Yu, Tianjun Xiao, Chang Xu, Kuiyuan Yang, Wei-Ying Ma, and Tiejun Zhao, Bag-of-Words Based Deep Neural Network for Image Retrieval, in ACM Multimedia, ACM – Association for Computing Machinery, November 2014.
- Yingwei Pan, Ting Yao, Tao Mei, Houqiang Li, Chong-Wah Ngo, and Yong Rui, Click-through-based Cross-view Learning for Image Search, ACM – Association for Computing Machinery, July 2014.
- tao mei, yong rui, shipeng li, and qi tian, Multimedia Search Reranking: A Literature Survey, in ACM Computing Surveys, 2014.
- Wenyuan Yin, Tao Mei, and Chang Wen Chen, Automatic Generation of Social Media Snippets for Mobile Browsing, ACM Multimedia, October 2013.
- Ting Yao, Tao Mei, Chong-Wah Ngo, and Shipeng Li, Annotation for Free: Video Tagging by Mining User Search Behavior, ACM Multimedia, October 2013.
- Wu Liu, Tao Mei, Yongdong Zhang, Jintao Li, and Shipeng Li, Listen, Look, and Gotcha: Instant Video Search with Mobile Phones by Layered Audio-Video Indexing, ACM Multimedia, October 2013.
- Ting Yao, Yuan Liu, Chong-Wah Ngo, and Tao Mei, Unified Entity Search in Social Media Community, in International World-Wide Web Conference (WWW), May 2013.
- Tao Mei, Jiebo Luo, Houqiang Li, Shipeng Li Heng Liu, Tao Mei, and Shipeng Li, Finding Perfect Rendezvous On the Go: Accurate Mobile Visual Localization and Its Applications to Routing, in ACM Multimedia, ACM Multimedia, November 2012.