We aim to enable synergetic collaboration between people and between people and computers to enlighten them and enrich their lives.
Overview
To achieve our mission, we develop scalable automatic content analysis methods and quality metrics to analyze a huge amount of online text such as blogs, community-based question answering, forum discussions, news, reviews, Twitter, Wikipedia, etc. and to harvest explicit and implicit knowledge from these media. To ensure the quality of harvested knowledge, we automatically construct per-topic global and local expert rankings through statistical analysis of the people who created the online contents. The results are not only used to rate harvested knowledge but also to form an active expert network to which users can connect. To leverage the collective intelligence of the crowd, we design smart applications that simplify users’ tasks and also learn and improve from their interaction with users.
Through analyzing web contents, identifying experts and enthusiasts behind the contents, and providing smart interaction between contents, users, and machines, we would like to automatically identify user intents and activities and to provide in-context and activity-optimized access to applications that just work.
Our current research focuses are listed below.
Social Question Answering & Summarization
The popularity of social question answering (SQA) services such as Microsoft Answers and MSDN Forums, Yahoo! Answers, Baidu Zhidao, and Naver Knowledge-In has demonstrated the value of social question answering. However, the existing SQA services are purely human-based, do not provide aggregated answers, and do not aggregate across services. Furthermore, there are related existing questions and their answers that could be found on the web. We would like to create a one-stop social question answering service that aggregates all services and all forms of existing questions and their answers through automatic question detection, question clustering, answer extraction, and answer summarization.
Sentiment Analysis
User generated contents (UGC) such as blogs, forums, reviews, Twitter, etc. have become great resources for observing user sentiment toward events, products, people, policies and so on. To leverage these valuable contents and provide insightful feedback to users, manufacturers, and other interested parties, we are developing automatic methods based on machine learning techniques, as well as semantic and discourse analysis to mine user sentiments from various UGC. Our collaboration with the Machine Learning group at MSRA has been adopted in Microsoft’s Commerce Search.
Expert and Social Search
Information retrieval and web search mostly focus on finding relevant documents or web sites to satisfy user needs. Less attention has been paid to assist users in identifying relevant experts or trusted people who can offer solutions in a timely and human manner. Our goal is to automatically create a global expert and friend recommendation social network to not only facilitate web-scale expert and social search but also leverage the results to rate online contents.
User Intent/Activity Recognition and Prediction
The holy grail of information access is to understand what users want and present to them just that. Researchers have started to make inroads into this area by building automatic classifiers to recognize user intents. However, most classification schemes are still coarse-grained and at single intent level, for example, “job intent” or “product intent”. To advance the state-of-the-art in user intent recognition, we utilize web scale query log, question log and click-through log to automatically induce an inventory of user intents and aggregate them into user activities. The ultimate aspiration is to move beyond single intent recognition into multiple user-intent sequence prediction.
Inarticulate User Assistance
Users of systems and services can be inarticulate for many reasons. They may be unaware of their information need, or unable to express it in the form of queries; they may have a poor input device; they may be physically challenged or their hands may be occupied. We would like to help such users solve their problems through minimal human-computer interaction, based on physical and digital user contexts.
Information Access Evaluation
Information access (IA) includes techniques such as information retrieval, question answering, text categorization, summarization and opinion analysis. To optimize IA systems economically and efficiently for end users, our project aims at design and construction of reliable test data and evaluation metrics. Our goal is to ensure user satisfaction while minimizing the need for human-in-the-loop evaluations.



