Machine Learning

Based in Beijing, China, the Machine Learning Group at Microsoft Research Asia focuses on machine learning research, knowledge discovery from large scale data and innovative systems powered by machine intelligence. With broad research efforts in areas like statistical learning, pattern recognition, text mining, optimization, information retrieval, recommendation, we are currently exploring practical technologies to enable large scale knowledge acquisition, to model user intention and to optimize the eco-system which involves users, rich clients and various online services. The Machine Learning Group is managed by senior researcher Zheng Chen.

Lei Ji
Lei Ji

Ning Liu
Ning Liu

Bingzheng Wei
Bingzheng Wei

Current Projects and Research Areas

Kable - Knowledge Table

Kable aims to extract structured knowledge from semi-structured and unstructured Web sites. It formulates the extracted knowledge in table format with each row stands for a domain entity and each column stands for an attribute such that the knowledge could be easily used for various Web applications such as search task simplification, attribute based search results filtering etc. Currently, Kable has focused on several domains such as “Movie”, “Company”, “Hotel”, “Book”, “Mobile Application” etc. There are three major ongoing sub-projects in Kable, which are,

      • Kable – APEX. Here APEX stands for Auto Production of EXtractors. Kable APEX aims to automatically discover the domain specific sites and extract structured knowledge from the semi-structured and free text Web. APEX also models the extracted structured knowledge for supporting Web applications such as knowledge based Q&A and entity page index.
      • Kable.Com. Here “Com” stands for Kable in Company domain. We do deep study on knowledge extraction and modeling in this specific domain. We not only extract and model general entity knowledge in this domain, but propose learning solutions for extracting and modeling unique knowledge for entities in this domain to allow information navigation.
      • Kable Revenue. Here REVENUE stands for REleVancE aNd User Experience. We aim to leverage the Web knowledge for improving search and ads relevance. Simultaneously, we innovate novel knowledge based online user experiences.

Collaborative Modeling for Recommendation

Existing recommendation system mainly works on some specific domain, e.g., to recommend movie or music for users. In reality, users may use different services and interact with various types of objects. In this project, we focus on collaborative modeling research for bringing semantics into recommendation engine. The research problems of interest include:

      • Unified learning framework to incorporate explicit concepts and implicit topics
      • Modeling structured knowledge and unstructured/heterogeneous data sources
      • Understanding interrelated entities and services across domains
      • Learning with constraint to optimize both observations and model generalization

Cross domain recommendation and paid search are applications to test our research.

Context Aware Intent Engine


The goal of this project is to simplify user task completion by delivering user centric experience. We target to build an intelligent system (intent engine) which has the following capabilities: 1) understand users’ intent; 2) connect users with relevant services or applications to complete tasks; 3) guide the interactions between user and system with machine intelligence. Our ongoing efforts are related to the following research problems:


      • Collection, processing and mining of user context data, e.g., user state, activity, physical surroundings, cyber context and even social graph.
      • Machine learning technologies to understand user inputs from various types of client devices, e.g., text, voice, image, and gesture.
      • User modeling research, to understand user preference, interest, demographic information and history data.
      • Dialog model research to facilitate interactions between user and system.

We are doing our research work on different devices (smart phone, slate, pc) and with various applications including but not limited to information retrieval, recommendation and personal assistant system.


Archived Projects and Research Areas

Large Scale Machine Learning Platform

The goal of this project is to provide a set of machine learning algorithms which can meet the requirements of research work and applications typically with very large scale data/features or applicable in multiple markets/domains. This platform provides but not limited to: classification, clustering, time series analysis, SVD, kernel distance function, statistical analysis, etc.

Behavior Targeting

Behavioral Targeting (BT) attempts to deliver the most relevant advertisements to the most interested audiences, and is playing an increasingly important role in online advertising market. There are a set of challenges for behavioral targeting research, which are user representation and modeling, user segmentation and targeted ads delivery. We have multiple sub-projects for behavioral targeting research. We start with the "Self Service Behaviroal Targeting" project. The most recent released products come from our BT research is the "Intent based Behavioral Targeting". Our ongoing project is called the Ad Selection with display ads team.

Categorized Search

Categorized Search is one of the solutions to organize search results by bringing categorization concepts into search products. Our focus is to scale up the whole solution, including: identifying popular galleries, mapping queries to galleries, creating intent profiles for galleries, and associating search result pages with intent profiles. We have implemented a tool used to organize queries and user search intents, which is a must-have for implementing the above search experience. We have used various kinds of data sources, including search log contributed by search engine users, Web pages provided by website editors and knowledge bases such as Wikipedia, Web directory organized by volunteers. Both processes are very effective and require not many human interactions, while the step of mapping result pages to intent profiles is fully automatic. At the same time, we will also exchange our thoughts about how to use our large scale machine learning toolkit to help scale up the solutions as well as our idea of how to evaluate Categorized Search system.

Opinion Search

Grassroots users play important roles in today’s Web. They have intensive communications using various kinds of channels like online community, blog, instant messenger, etc. Meanwhile, these users also contribute content data to the Web, e.g., opinion data which contains the knowledge of grassroots users, large in scale and updates very frequently. In order to well organize and utilize these data, we try to collect, store and organize user opinion data. Based on the analysis and mining of opinion data, we try to understand the opinion expressed by grassroots users as well as their requirements, which will help other Web users to make purchase decision, to direct manufacturers to improve their products and services. Different from previous research work focusing on the analysis of social network, we focus on analyzing text opinion data in this project.