jfgao

Jianfeng Gao

I am a Researcher in Natural Language Processing Group at Microsoft Research. From 2005 to 2006, I was a software developer in Natural Interactive Services Division at Microsoft. From 1999 to 2005, I was a researcher in Natural Language Computing Group at Microsoft Research Asia.  I live with my family in Kirkland, WA.

Contact information

Microsoft Corporation

One Microsoft Way

Redmond, WA 98052, U.S.A.

E-mail: jfgao@microsoft.com

http://research.microsoft.com/~jfgao/¡¡

Research interests

Web search and mining

Information retrieval

Natural language processing

Statistical machine learning

Downloads

¡¤         The Microsoft Research ESL Assistant is a web service that provides correction suggestions for typical ESL (English as a Second Language) errors. Such errors include, for example, the choice of determiners (the/a) and the choice of prepositions. The web service also provides word choice suggestions from a thesaurus. In order to help the user make decisions on whether to accept a suggestion, the service displays "before and after" web search results so that the user can see real-life examples of the usage of both their original input and the suggested correction. An Outlook plugin that connects to the web service and copies text from an email into the web service UI is also available. For a detailed description of the system, see our paper.

¡¤         The MSRLM (download here) is a Scalable Language Modeling Toolkit, Microsoft Research Language Modeling. The toolkit implements an efficient method to build large language models, from billions of words and upwards. We use these language models for first-pass decoding in statistical machine translation.

¡¤         "Orthant-Wise Limited-memory Quasi-Newton" algorithm (OWL-QN) is a new method for optimizing an L1-regularized loss that is very efficient, even on problems with millions of parameters. Source code for OWL-QN, including a standalone trainer for L1-regularized least-squares or logistic regression, is available for download. Refer to (Galen and Gao, 2007) for the description of the algorithm, and (Gao et al., 2007) for its application in several NLP tasks, and a comparison with other state-of-the-art parameter estimators.

¡¤         Microsoft Research IME Corpus provides a test data set for the task of Japanese character conversion for text input. For more about the corpus, see our technical report.

¡¤         S-MSRSeg is simplified version of the Chinese word segmenter and named entity recognizer described in (Gao et al., 2005).

Recent publications

2009

¡¤         Jianfeng Gao, Wei Yuan, Xiao Li, Kefeng Deng and Jian-Yun Nie. 2009. Smoothing clickthrough data for web search ranking. In SIGIR. [PDF]

¡¤         Jianfeng Gao, Qiang Wu, Chris Burges, Krysta Svore, Yi Su, Nazan Khan, Shalin Shah and Hongyan Zhou. 2009. Model adaptation via model interpolation and boosting for web search ranking. In EMNLP. [PDF]

¡¤         Hisami Suzuki, Xiao Li and Jianfeng Gao. 2009. Discovery of term variation in Japanese web search queries. In EMNLP. [PDF]

¡¤         Xiaodong He, Mei Yang, Jianfeng Gao, Patrick Nguyen and Robert Moore. 2009. Improving monolingual hypothesis alignment for machine translation system combination. To appear in ACM Trans on Asian Language Information Processing. [draft version]

¡¤         Qiang Wu, Christopher J. C. Burges, Krysta M. Svore and Jianfeng Gao. 2009. Adapting boosting for information retrieval. To appear in  Information Retrieval. [PDF] (The original publication is available at www.springerlink.com)

 

2008

¡¤         Jianfeng Gao and Mark Johnson. 2008. A comparison of Bayesian estimators for unsupervised hidden Markov model POS taggers. In EMNLP. [PDF]

¡¤         Xiaodong He, Mei Yang, Jianfeng Gao, Patrick Nguyen and Robert Moore. 2008. Indirect-HMM-based hypothesis alignment for combining outputs from machine translation systems. In EMNLP. [PDF]

¡¤         Jia Xu, Jianfeng Gao, Kristina Toutanova and Hermann Ney. 2008. Bayesian semi-supervised Chinese word segmentation for statistical machine translation. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING), Manchester, UK. [PDF]

¡¤         Guihong Cao, Jian-Yun Nie, Jianfeng Gao and Stephen Robertson.2008. Selecting good expansion terms for pseudo-relevance feedback. In SIGIR. [PDF]

¡¤         Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William B. Dolan, Dmitriy Belenko and Lucy Vanderwende. 2008. Using contextual speller techniques and language modeling for ESL error correction. In IJCNLP. [PDF] (We have developed a web service of the contextual speller, click here to try.)

¡¤         Xing Yi, Jianfeng Gao and William B. Dolan. 2008. A web-based English proofing system for English as a second language users. In IJCNLP. [PDF]

 

2007

¡¤         Patrick Nguyen, Jianfeng Gao, and Milind Mahajan. 2007. MSRLM: a scalable language modeling toolkit. Microsoft Research Technical Report, MSR-TR-2007-144. [PDF] (The toolkit is used in the MSR statistical machine translation system for NIST evaluation, and is available for download.)

¡¤         Jianfeng Gao, Galen Andrew, Mark Johnson and Kristina Toutanova. 2007. A comparative study of parameter estimation methods for statistical natural language processing. In ACL. [PDF]

¡¤         Galen Andrew and Jianfeng Gao. 2007. Scalable training of L1-regularized log-linear models. In ICML. [PDF] (The source code is available for download.)

¡¤         Ken Church, Ted Hard and Jianfeng Gao. 2007. Compressing trigram language models with Golomb coding. In EMNLP-CoNLL. [PDF]

¡¤         Guihong Cao, Jianfeng Gao and Jian-Yun Nie. 2007. A system to mine large-scale bilingual dictionaries from monolingual web pages. In MT Summit XI. [PDF]

¡¤         Guihong Cao, Jianfeng Gao, Jian-Yun Nie and Jing Bai. 2007. From query translation to cross-language query expansion with Markov chain models. In CIKM. [PDF]

¡¤         Jianfeng Gao and Hismai Suzuki. 2007. Foundations of statistical natural language processing: a case study of text input system. Tutorial at MSR Weihai Summer School. [slides] (A sample set of the IME corpus used in the examples in the tutorial is available for download. For a detailed description of the IME corpus, see our technical report)

 

2006

¡¤         Jianfeng Gao and Jian-Yun Nie, 2006. Study of Statistical Models for Query Translation: Finding a Good Unit of Translation. In SIGIR. [PDF]

¡¤         Jianfeng Gao, Jian-Yun Nie, Ming Zhou. 2006. Statistical Query Translation Models for Cross Language Information Retrieval. ACM Trans on Asian Language Information Processing, 5(4): 323-359. [draft version]

¡¤         Jianfeng Gao, Hisami Suzuki, Wei Yuan. 2006. An Empirical Study on Language Model Adaptation. ACM Trans on Asian Language Information Processing, 5(3): 207-227. [draft version]

¡¤         Jianfeng Gao, Hisami Suzuki, Bin Yu. 2006. Approximation Lasso Methods for Language Modeling. In COLING-ACL. [PDF]

¡¤         Lei Shi, Cheng Nie, Ming Zhou, Jianfeng Gao. 2006. A DOM Tree Alignment Model for Mining Parallel Data from the Web. In COLING-ACL. [PDF]

¡¤         Zhengyu Zhou, Jianfeng Gao, Frank K Soong, Helen Meng. 2006. A Comparative Study of Discriminative Methods for Reranking LVCSR N-best Hypotheses in Domain Adaptation and Generalization. In ICASSP. [PS]

¡¤         Chin-Yew Lin, Guihong Cao, Jianfeng Gao, Jian-Yun Nie. An Information-Theoretic Approach to Automatic Evaluation of Summaries. In HLT-NAACL. [PDF]

¡¤         Yi Zhang, Ke Wu, Jianfeng Gao, Philip Vines. 2006. Automatic Acquisition of Chinese-English Parallel Corpus from the Web. In ECIR. [PDF]

 

2005

  • Jianfeng Gao, Mu Li, Andi Wu, and Chang-Ning Huang. 2005. Chinese word segmentation and named entity recognition: a pragmatic approach. Computational Linguistics, 31(4). [draft version] (A simplified version of the system described in the paper, called S-MSRSeg, is available for download.)
  • Jianfeng Gao, Hao Yu, Wei Yuan and Peng Xu. 2005. Minimum sample risk methods for Language modeling. In HLT/EMNLP. [PDF]
  • Hisami Suzuki and Jianfeng Gao. 2005. A comparative study on language model adaptation using new evaluation metrics. In HLT/EMNLP. [PDF]
  • Jianfeng Gao, Haoliang Qin, Xinsong Xia and Jian-Yun Nie. Linear discriminative model for information retrieval. In SIGIR. [PDF]
  • Xiaojun Wan, Jianfeng Gao, Mu Li and Binggong Ding. 2005. Person resolution in person search results: WebHawk. In CIKM. [PDF]
  • Wei Yuan, Jianfeng Gao and Hisami Suzuki. 2005. An empirical study on language model adaptation using a metric of domain similarity. In IJCNLP. [PDF]

 

2004

  • Jianfeng Gao and Chin-Yue Lin. 2004. Introduction to the special issue on statistical language modeling. ACM Transactions on Asian Language Information Processing, Vol. 3, No. 2, June 2004, pp 87-93. [PDF]
  • Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu and Guihong Cao. 2004. Dependence language model for information retrieval. In SIGIR. [PDF]
  • Jianfeng Gao, Andi Wu, Mu Li, Chang-Ning Huang, Hongqiao Li, Xinsong Xia and Haowei Qin. 2004. Adaptive Chinese word segmentation. In ACL. [PDF]
  • Jianfeng Gao and Hisami Suzuki. 2004. Capturing long distance dependency for language modeling: an empirical study. In IJCNLP. [PDF]
  • Hongqiao Li, Chang-Ning Huang, Jianfeng Gao and Xiaozhong Fan. 2004. The use of SVM for Chinese new word identification. In IJCNLP. [PDF]
  • Qiang Yang, Charles X. Ling and Jianfeng Gao. 2004. Mining web logs for actionable knowledge. Book chapter in Ning Zhong and Jiming Liu, editors, Intelligent Technologies for Information Analysis. Springer, 2004. [PDF]

 

2003

  • Jianfeng Gao, Mu Li and Chang-Ning Huang. 2003. Improved source-channel models for Chinese word segmentation. In ACL. [PDF]
  • Jianfeng Gao and Hisami Suzuki. 2003. Unsupervised learning of dependency structure for language modeling. In ACL. [PDF]

 

2002

  • Charles X Ling, Jianfeng Gao, Huajie Zhang, Weining Qian and Hongjiang Zhang. 2002. Improving Encarta Search Engine Performance by Mining User Logs. International Journal of Pattern Recognition and Artificial Intelligence. Vol. 16, No. 8. 2002. [draft version]
  • Jianfeng Gao, Jian-Yun Nie, Hongzhao He, Weijun Chen and Ming Zhou. 2002. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependency relations. In SIGIR. [PDF]
  • Hongzhao He and Jianfeng Gao. 2002. NTCIR-3 CLIR experiments at MSRA. In NTCIR-3. [PDF]
  • Jian Sun, Jianfeng Gao, Lei Zhang, Ming Zhou and Chang-Ning Huang. 2002. Chinese named entity identification using class-based language model. In COLING. [PDF]
  • Jianfeng Gao, Hisami Suzuki and Yang Wen. 2002. Exploiting headword dependency and predictive clustering for language modeling. In EMNLP. [PDF]
  • Jianfeng Gao, Joshua Goodman, Guihong Cao and Hang Li. 2002. Exploring asymmetric clustering for statistical language modeling. In ACL. [PDF]
  • Jianfeng Gao and Min Zhang. 2002. Improving language model size reduction using better pruning criteria. In ACL. [PDF]
  • Jianfeng Gao, Joshua Goodman, Mingjing Li and Kai-Fu Lee. 2002. Toward a unified approach to statistical language modeling for Chinese. ACM Transactions on Asian Language Information Processing, Vol. 1, No. 1, pp 3-33. [draft version]

 

2001

  • Jianfeng Gao, Guihong Cao, Hongzhao He, Min Zhang, Jian-Yun Nie, Stephen Walker and Stephen Robertson. 2001. TREC-10 web track experiments at MSRCN. In TREC-10. (PDF)
  • Jianfeng Gao, Jian-Yun Nie, Jian Zhang, Endong Xun, Ming Zhou and Chang-Ning Huang. 2001. Improving query translation for CLIR using statistical Models. In SIGIR. (PDF)
  • Jianfeng Gao, Joshua Goodman and Jiangbo Miao. 2001. The use of clustering techniques for language modeling ¨C application to Asian languages. Computational Linguistics and Chinese Language Processing, Vol. 6, No. 1, pp 27-60. [draft version]
  • Jian Zhang, Jianfeng Gao, Ming Zhou and Jiaxing Wang. 2001. Improving the effectiveness of information retrieval with clustering and fusion. Computational Linguistics and Chinese Language Processing, Vol. 6, No. 1, pp 109-125. [draft version]

 

2000

  • Jianfeng Gao, Jian-Yun Nie, Jian Zhang, Endong Xun, Yi Su, Ming Zhou and Chang-Ning Huang. 2000. TREC-9 CLIR experiments at MSRCN. In TREC-9. [PDF]
  • Jianfeng Gao and Kai-Fu Lee. 2000. Distribution-based pruning of backoff language models. In ACL. [PDF]
  • Ting Liu, Ming Zhou, Jianfeng Gao and Chang-Ning Huang. 2000. PENS: A machine-aided English writing system for Chinese users. In ACL. [PDF]
  • Jian-Yun Nie, Jianfeng Gao, Jian Zhang and Ming Zhou. 2000. On the use of words and n-grams for Chinese information retrieval. In IRAL. [PDF]
  • Jianfeng Gao, Hai-Feng Wang, Mingjing Li and Kai-Fu Lee. 2000. A unified approach to statistical language modeling for Chinese. In ICASSP. [PDF]
  • Jianfeng Gao, Mingjing Li and Kai-Fu Lee. 2000. N-gram distribution based language model adaptation. In ICSLP. [PDF]
  • Joshua Goodman and Jianfeng Gao. 2000. Language model compression by predictive clustering. In ICSLP. [PDF]

 

1999

¡¤         Jianfeng Gao, Case and constraint: research on intelligent CAD systems. (in Chinese) PhD thesis, Shanghai Jiaotong University, 1999.  (zip)

More information on ...


Last updated $Date: 2009/6/20$ by jfgao@microsoft.com