The goal of the Web Search and Mining research area of Microsoft Research Asia is to define the next generation Web by leveraging data mining, machine learning, knowledge discovery, and media analysis techniques for information analysis, organization, retrieval, and visualization. We expect the next generation Web will be an organic combination of traditional Web(W), social networks(S) and mobile/sensor networks (M).
Exploring New Search Paradigms
In the past decade, search engines have been developed with the goal of better organizing the Web information. Search has a long document-centric tradition, where “searching information” is equivalent to “searching documents”. We try to go beyond the document-centric tradition to explore new search paradigms and bring search back to “searching information”.
Interactive Knowledge Mining and Crowdsourcing
We are exploring a new paradigm to enable web-scale entity search and knowledge mining, extracting and integrating web information for various types of real-world entities. We rank these entities in terms of their relevance and popularity in answering user queries. We are also building an interactive knowledge mining platform for users to effectively interact with and contribute to our automated entity extraction and disambiguation systems.
Machine Learning for Web Search
Web search can be viewed as an intelligent system built with huge amount of content data and behavior data using machine learning techniques. All the major tasks in web search, including crawling, indexing, query understanding, document understanding, query-document matching, ranking, and search result presentation need to make intelligent decisions, and the most effective approach to performing the tasks is to use data-driven and machine learning techniques. At the Web Search and Mining group, we aim to develop fundamental and advanced machine learning technologies to improve all the aspects of web search system. We specifically focus on the development of machine learning technologies for improving the relevance of tail queries and the diversity of head queries.
Managing Data from the Physical World
By accumulating and aggregating physical world information from multiple users and multiple mobile devices over a long period, collective social intelligence can be derived. In this center, we are working on various technologies to manage physical world information and build intelligence from them. We try to link data from people, services and sensors together with a unified knowledge model and provide the intelligence as a service in the cloud.
Multimedia search and visual information mining
We focus on pattern analysis and extraction for content understanding and data mining of multimedia data. We are working on research problems in search-based image annotation, large scale visual indexing and recognition, sketch-based image search, and object recognition with 3D structures. Results are targeted toward new advanced services in the delivery of intelligence and insight to image understanding and visual information retrieval.