Jianfeng Gao

Researcher in Natural Language Processing Group at Microsoft Research. From June 2005 to February 2006, I was a researcher and software development engineer at Natural Interactive Services Division (NISD) at Microsoft. From April 1999 to June 2005, I was a researcher in Natural Language Computing Group at Microsoft Research Asia.

Contact information

Microsoft Corporation

One Microsoft Way

Redmond, WA 98052, U.S.A.

E-mail: jfgao@microsoft.com

Tel: 1-425-705-1479

Fax: 1-425-936-7329

Research interests

Natural language processing

Information retrieval

Statistical machine learning

Natural user interface

Software

  • MSRLM: A scalable language modeling toolkit we used in our NIST SMT evaluation. Refer to (Nguyen, Gao and Mahajan 2007) for a description of usage.
  • OWL-QN: C++ source code of Orthant-Wise Limited-memory Quasi-Newton algorithm, which has been used to optimize L1-regularized log-linear models. Refer to (Andrew and Gao, 2007) for a detailed description of the algorithm, and (Gao et al. 2007) for its use for NLP applications.
  • MSR IME Feature Corpus: This corpus provides datasets we used in the language model adaptation experiment described in (Gao et al. 2007).
  • MSR IME Corpus: This corpus provides a test data set for the task of Japanese character conversion for test input. See (Suzuki and Gao 2006) for a detailed description.
  • S-MSRSeg: Chinese word segmenter, named entity recognizer, described in (Gao et al. 2005).

Recent Publications

2008

·         Michael Gamon, Jianfeng Gao, Chris Brockett, Alexandre Klementiev, William B. Dolan, Dmitriy Belenko and Lucy Vanderwende. 2008. Using contextual speller techniques and language modeling for ESL error correction. In IJCNLP. (pdf)

·         Xing Yi, Jianfeng Gao and William B. Dolan. 2008. A web-based English proofing system for English as a second language users. In IJCNLP. (pdf)

 

2007

  • Patrick Nguyen, Jianfeng Gao and Milind Mahajan. 2007. MSRLM: a scalable language modeling toolkit. MSR-TR-2007-144. (pdf) – (The toolkit can be downloaded from here).
  • Jianfeng Gao, Galen Andrew, Mark Johnson and Kristina Toutanova. 2007. A comparative study of parameter estimation methods for statistical natural language processing. In ACL. (pdf) – (The datasets used for the language model adaptation experiment can be downloaded from here. The datasets for parse reranking can be downloaded from Mark Johnson’s website in Brown University.)
  • Galen Andrew and Jianfeng Gao. 2007. Scalable training of L1-regularized log-linear models. In ICML. (pdf) -- (source code of the algorithm can be downloaded here.)
  • Ken Church, Ted Hard and Jianfeng Gao. 2007. Compressing trigram language models with Golomb coding. In EMNLP-CoNLL. (pdf)
  • Guihong Cao, Jianfeng Gao and Jian-Yun Nie. 2007. A system to mine large-scale bilingual dictionaries from monolingual web pages. In MT Summit XI. (pdf)
  • Guihong Cao, Jianfeng Gao, Jian-Yun Nie and Jing Bai. 2007. From query translation to cross-language query expansion with Markov chain models. In CIKM. (pdf)
  • Jianfeng Gao and Hisami Suzuki. Foundations of statistical natural language processing: a case study of text input system. Tutorial in Weihai MSRA-HIT NLP Summer School. (ppt)

 

2006

  • Jianfeng Gao and Jian-Yun Nie. 2006. Study of Statistical Models for Query Translation: Finding a Good Unit of Translation. In SIGIR. (pdf)
  • Jianfeng Gao, Jian-Yun Nie and Ming Zhou. 2006. Statistical Query Translation Models for Cross Language Information Retrieval. ACM Trans on Asian Language Information Processing, 5(4): 323-359. (draft version)
  • Jianfeng Gao, Hisami Suzuki and Wei Yuan. 2006. An Empirical Study on Language Model Adaptation. ACM Trans on Asian Language Information Processing, 5(3): 207-227. (draft version)
  • Jianfeng Gao, Hisami Suzuki and Bin Yu. 2006. Approximation Lasso Methods for Language Modeling. In COLING-ACL. (pdf)
  • Lei Shi, Cheng Nie, Ming Zhou and Jianfeng Gao. 2006. A DOM Tree Alignment Model for Mining Parallel Data from the Web. In COLING-ACL 2006. (pdf)
  • Zhengyu Zhou, Jianfeng Gao, Frank K Soong and Helen Meng. 2006. A Comparative Study of Discriminative Methods for Reranking LVCSR N-best Hypotheses in Domain Adaptation and Generalization. In ICASSP 2006. (ps)
  • Chin-Yew Lin, Guihong Cao, Jianfeng Gao and Jian-Yun Nie. 2006. An Information-Theoretic Approach to Automatic Evaluation of Summaries. In HLT-NAACL 2006. (pdf)
  • Yi Zhang, Ke Wu, Jianfeng Gao and Philip Vines. 2006. Automatic Acquisition of Chinese-English Parallel Corpus from the Web. In ECIR 2006. (pdf)

 

2005

  • Jianfeng Gao, Mu Li, Andi Wu and Chang-Ning Huang.  Chinese word segmentation and named entity recognition: a pragmatic approach. Computational Linguistics, 31(4). (draft version)
  • Jianfeng Gao, Hao Yu, Wei Yuan and Peng Xu. Minimum sample risk methods for language modeling. In HLT/EMNLP 2005. (pdf)
  • Hisami Suzuki and Jianfeng Gao. A comparative study on language model adaptation using new evaluation metrics. In HLT/EMNLP 2005.(pdf)
  • Jianfeng Gao, Haoliang Qin, Xinsong Xiao and Jian-Yun Nie. Linear discriminative model for information retrieval. In SIGIR 2005. (pdf)
  • Xiaojun Wan, Jianfeng Gao, Mu Li and Binggong Ding. Person resolution in person search results: WebHawk. In CIKM 2005. (pdf)
  • Wei Yuan, Jianfeng Gao and Hisami Suzuki. An empirical study on language model adaptation using a metric of domain similarity. In IJCNLP 2005. (pdf)

 

2004

·         Jianfeng Gao, Mu Li, Andi Wu and Chang-Ning Huang.  Chinese word segmentation and named entity recognition: a pragmatic approach. Microsoft Research Technical Report, MSR-TR-2004-123. (pdf)

·         Jianfeng Gao and Chin-Yue Lin. Introduction to the special issue on statistical language modeling. ACM Transactions on Asian Language Information Processing, Vol. 3, No. 2, June 2004, pp 87-93. (pdf)

  • Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu and Guihong Cao. 2004b. Dependence language model for information retrieval. In SIGIR-2004. Sheffield, UK, July 25-29. (pdf)
  • Jianfeng Gao, Andi Wu, Mu Li, Chang-Ning Huang, Hongqiao Li, Xinsong Xia and Haowei Qin. 2004c. Adaptive Chinese word segmentation. In ACL-2004. Barcelona, July 21-26. (pdf)
  • Jianfeng Gao and Hisami Suzuki. 2004. Capturing long distance dependency for language modeling: an empirical study. In IJCNLP-04. Sanya City, Hainan Island, China, March 22-24. (pdf)
  • Hongqiao Li, Chang-Ning Huang, Jianfeng Gao and Xiaozhong Fan, 2004. The use of SVM for Chinese new word identification.  In IJCNLP-04. Sanya City, Hainan Island, China, March 22-24. (pdf)
  • Qiang Yang, Charles X. Ling and Jianfeng Gao. 2004. Mining web logs for actionable knowledge. Book chapter in Ning Zhong and Jiming Liu, editors, Intelligent Technologies for Information Analysis. Springer, 2004. (draft version)

 

2003

  • Jianfeng Gao, Mu Li and Chang-Ning Huang. 2003. Improved source-channel models for Chinese word segmentation. In ACL-2003. Sapporo, Japan, 7-12, July, 2003. (pdf)
  • Jianfeng Gao and Hisami Suzuki. 2003. Unsupervised learning of dependency structure for language modeling. In ACL-2003. Sapporo, Japan, 7-12, July, 2003. (pdf)

 

2002

  • Charles X Ling, Jianfeng Gao, Huajie Zhang, Weining Qian, and Hongjiang Zhang. 2002. Improving Encarta Search Engine Performance by Mining User Logs. International Journal of Pattern Recognition and Artificial Intelligence. Vol. 16, No. 8. 2002. (pdf)
  • Jianfeng Gao, Jian-Yun Nie, Hongzhao He, Weijun Chen, Ming Zhou. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependency relations. In: Conference on Research and Development in Information Retrieval, ACM SIGIR'02, Tampere, Finland, 11-15 August 2002. (pdf)
  • Hongzhao He and Jianfeng Gao. NTCIR-3 CLIR experiments at MSRA. In: NTCIR-3. October 8-10. Tokyo, Japan. (pdf)
  • Jian Sun, Jianfeng Gao, Lei Zhang, Ming Zhou, and Changning Huang. Chinese named entity identification using class-based language model. In: COLING 2002. Taipei, Taiwan, August 24-25, 2002. (pdf)
  • Jianfeng Gao, Hisami Suzuki, Yang Wen. Exploiting headword dependency and predictive clustering for language modeling. EMNLP2002, University of Pennsylvania, Philadelphia, PA, USA. July 6-7, 2002. (pdf)
  • Jianfeng Gao, Joshua Goodman, Guihong Cao, Hang Li. Exploring asymmetric clustering for statistical language modeling. ACL2002,University of Pennsylvania, Philadelphia, PA, USA. July 6-12, 2002. (pdf)
  • Jianfeng Gao, Min Zhang. Improving language model size reduction using better pruning criteria. ACL2002,University of Pennsylvania, Philadelphia, PA, USA. July 6-12, 2002. (pdf)
  • Jianfeng Gao, Joshua Goodman, Mingjing Li, Kai-Fu Lee. Toward a unified approach to statistical language modeling for Chinese. ACM Transactions on Asian Language Information Processing, Vol. 1, No. 1, pp 3-33. 2002. (pdf)

 

2001

  • Jianfeng Gao, Guihong Cao, Hongzhao He, Min Zhang, Jian-Yun Nie, Stephen Walker, and Stephen Robertson. TREC-10 web track experiments at MSRCN. The Tenth Text Retrieval Conference (TREC-10), 2001. (pdf)
  • Jian-Yun Nie, Jianfeng Gao, Jian Zhang and Ming Zhou. On the use of words and n-grams for Chinese information retrieval. In IRAL 2000. (pdf)
  • Jianfeng Gao, Jian-Yun Nie, Jian Zhang, Endong Xun, Ming Zhou, and Changning Huang. Improving query translation for CLIR using statistical Models. In:Conference on Research and Development in Information Retrieval, ACM SIGIR’01, New Orleans, Louisiana, USA. September 9-12, 2001. (pdf)
  • Jianfeng Gao, Joshua Goodman, Jiangbo Miao. The use of clustering techniques for language modeling – application to Asian language s. Computational Linguistics and Chinese Language Processing, Vol. 6, No. 1, pp 27-60.2001. (pdf)
  • Jian Zhang, Jianfeng Gao, Ming Zhou, Jiaxing Wang. Improving the effectiveness of information retrieval with clustering and fusion. To appear in Computational Linguistics and Chinese Language Processing, Vol. 6, No. 1, pp 109-125.2001. (pdf)
  • Charles X Ling, Jianfeng Gao, Huajie Zhang, Weining Qian, and Hongjiang Zhang. Mining generalized query patterns from user logs. HICSS-34. The 34th Hawaii International Conference on System Sciences, Hawaii 3-6, January 2001. (pdf)

 

2000

  • Jianfeng Gao, Jian-Yun Nie, Jian Zhang, Endong Xun, Yi Su, Ming Zhou, and Changning Huang. TREC-9 CLIR experiments at MSRCN. The Ninth Text Retrieval Conference (TREC-9), 2000. (pdf)
  • Jianfeng Gao, and Kai-Fu Lee. Distribution-based pruning of backoff language models. ACL-2000. The 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong 3-6 October, 2000. (pdf)
  • Ting Liu, Ming Zhou, Jianfeng Gao, and Changning Huang. PENS: A machine-aided English writing system for Chinese users. ACL-2000. The 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, 3-6 October 2000. (pdf)
  • Jianfeng Gao, Hai-Feng Wang, Mingjing Li, and Kai-Fu Lee. A unified approach to statistical language modeling for Chinese. ICASSP-2000, Istanbul, Turkey, June 5 - 9, 2000. (pdf)
  • Jianfeng Gao, Mingjing Li, and Kai-Fu Lee. N-gram distribution based language model adaptation. ICSLP-2000, International Conference on Spoken Language Processing, Beijing, October 16-20, 2000. (pdf)
  • Joshua Goodman and Jianfeng Gao. Language model compression by predictive clustering. ICSLP-2000, International Conference on Spoken Language Processing, Beijing, October 16-20, 2000. (pdf)

 

1999

·         Jianfeng Gao. Case and constraint: research on intelligent CAD systems. PhD thesis, Shanghai Jiaotong University, 1999. (in Chinese) (chapter 6)

More information on ...

 Natural Language Processing Group Home Page

 Microsoft Research Home Page


Last updated $Date: 2007/09/05 $ by jfgao@microsoft.com