Research Projects

 

Semantic Matching (Search Relevance)

2006 – present, Microsoft Research Asia.

Major Collaborators: Gu Xu, Jun Xu, Jingfang Xu, Yunhua Hu, Daxin Jiang, Wei-Ying Ma, Jiafeng Guo, Wei Wu, Quan Wang, Ziqi Wang.

Goal: To enhance the search relevance at Bing and SharePoint Search, particularly, the relevance of tail queries.

Achievements: Methods for query understanding, document understanding, and query document matching have been developed. For example, query understanding methods [Guo et al., SIGIR 2007; Guo et al. SIGIR 2008; Wang et al. ACL 2011], document understanding methods [Wang et al., SIGIR 2011], and query document matching method [Wu et al. JMLR 2011; Wu et al. MSR-TR 2011].

Search Log Mining

2007 – 2011, Microsoft Research Asia.

Goal: developing advanced search log mining platform and tools for Bing.

Major Collaborators: Daxin Jiang, Yunhua Hu, Wei-Ying Ma, Huanhuan Cao, Biao Xiang, Jian Pei.

Achievements: The search log mining system LOGAL (log object gallery) is being used in Bing. Applications based on search log mining have been developed, such as context aware search [Cao et al. KDD 2008; Cao et al., WWW 2009; Xiang et al., SIGIR 2010].

Enterprise Search

2003 – present, Microsoft Research Asia.

Major Collaborators: Yunbo Cao, Yunhua Hu, Jun Xu, Jin Jiang, Xin Zou, Guangping Gao, Xiaoyuan Cui, Congrui Ji, Xiaolin Quan, Like Liu, Ambrosio Blanco, Pairu Chen, Dmitriy Meyerzon, Victor Poznanski, Ping Lin, Shenghua Bao, Shenjie Li, Jingjing Liu, Min Zhao.

Goal: To develop advanced technologies for SharePoint Search.

Achievements: We have developed technologies for enterprise search using information extraction and text mining [Li et al., CIKM 2005]. Two prototype systems (named InfoDesk and Acing) have been deployed within Microsoft. Related technologies have been transferred to Office 2007, Office 2010, and the next release of Office. Methods for document metadata extraction have been developed [Hu et al., JCDL 2005; Hu et al., SIGIR 2005]. A method for definition search has been proposed [Xu et al. WWW 2005]. A method for expert search using a two stage language model has been built, which became one of the best performing methods at TREC [Cao et al., TREC 2005].

Learning to Rank

2006 – 2010, Microsoft Research Asia.

Major Collaborators: Tie-Yan Liu, Jun Xu, Tao Qin, Yunbo Cao, Yunhua Hu, Zhe Cao, Xiubo Geng, Yanyan Lan, Fen Xia, Ming Li, Xin Jiang, Wei Chen, Rong Jin, Zhi-Hua Zhou.

Goal: To investigate the fundamental issues of learning to rank.

Achievements: We have developed several popular learning to rank algorithms including IR SVM [Cao et al., SIGIR 2006], AdaRank [Xu & Li, SIGIR 2007], ListNet [Cao et al., ICML 2007], ListMLE [Xia et al., ICML 2008]. A benchmark data set for learning to rank, called LETOR has also been released and is being widely used in the research communities [Liu et al., SIGIR Workshop 2007]. A monograph on learning to rank has been published [Li 2011].

Search Importance Ranking

2006 – 2010, Microsoft Research Asia.

Major Collaborators: Tie-Yan Liu, Bin Gao, Yuting Liu, Ziming Ma, Shuyuan He.

Goal: To investigate advanced algorithms for importance ranking in search.

Achievements: We have developed several algorithm for web page importance ranking, including BrowseRank [Liu et al. SIGIR 2008].

Text Mining Tools

2003 – 2005, Microsoft Research Asia

Major Collaborators: Yunbo Cao, Ye Zhang, Jin Jiang, Zhaohui Tang, Jie Tang, Olivier Ribet, Raman Chandrasekar.

Achievements: We developed  the text mining tools in SQL Server 2005. We also developed the text mining tool TextMiner, used within Microsoft.  A method for email data cleaning has also been proposed [Tang et al., KDD 2005].

English Reading Assistance

2001 – 2003, Microsoft Research Asia.

Collaborators: Yunbo Cao, Cong Li, Ming Zhou

Goal: Developing technologies to help non-native speakers to read English.

Achievements: We developed a prototype system for English reading assistance [Li et al., IEEE IS 2003], and have proposed a method for word sense disambiguation called Bilingual Bootstrapping [Li & Li, CL 2004].

Text Mining for Survey Data Analysis

1997 – 2001, NEC Research Laboratories.

Collaborator: Kenji Yamanishi

Goal: Developing text mining technologies for questionnaire data analysis.

Achievements: A product called TopicScope was developed and released from NEC [Yamanishi & Li, KDD 2001]. A method for learning decision list has been developed [Li & Yamanishi, IPM 2002].

Lexical Semantic Knowledge Acquisition

1992 – 1998, NEC Research Laboratories.

Collaborator: Naoki Abe

Goal: Developing machine learning techniques for automatically acquiring lexical semantic knowledge.

Achievements:  We have developed several methods for lexical semantic knowledge acquisition, including case frame learning and thesaurus learning using the MDL principle [Li & Abe, CL 1998; Li, NLE 2002].  The work was also summarized in my PhD thesis at the University of Tokyo, under supervision of Prof. Jun'ichi Tsujii [Li, PhD thesis 1998].