
Jianfeng Gao
Researcher in Natural Language
Processing Group at Microsoft
Research. From June 2005 to February 2006, I was a researcher and software
development engineer at Natural Interactive Services Division (NISD) at
Microsoft. From April 1999 to June 2005, I was a researcher in Natural
Language Computing Group at Microsoft
Research Asia.

Contact information
Microsoft Corporation
One Microsoft Way
Redmond, WA
98052, U.S.A.
E-mail: jfgao@microsoft.com
Tel: 1-425-705-1479
Fax: 1-425-936-7329
Research interests
Natural language
processing
Information retrieval
Statistical machine
learning
Natural user interface

Software
- MSRLM:
A scalable language modeling toolkit we used in our NIST SMT evaluation.
Refer to (Nguyen, Gao and Mahajan 2007)
for a description of usage.
- OWL-QN:
C++ source code of Orthant-Wise Limited-memory Quasi-Newton algorithm,
which has been used to optimize L1-regularized log-linear models. Refer to
(Andrew and Gao, 2007) for a
detailed description of the algorithm, and (Gao et al. 2007) for its use for
NLP applications.
- MSR
IME Feature Corpus: This corpus provides datasets we used in the
language model adaptation experiment described in (Gao et al. 2007).
- MSR
IME Corpus: This corpus provides a test data set for the task of
Japanese character conversion for test input. See (Suzuki and
Gao 2006) for a detailed description.
- S-MSRSeg:
Chinese word segmenter, named entity recognizer, described in (Gao et al. 2005).

Recent Publications
2008
·
Michael Gamon, Jianfeng Gao, Chris Brockett,
Alexandre Klementiev, William B. Dolan, Dmitriy Belenko and Lucy Vanderwende.
2008. Using contextual speller techniques and language modeling for ESL error
correction. In IJCNLP. (pdf)
·
Xing Yi, Jianfeng Gao and William B. Dolan.
2008. A web-based English proofing system for English as a second language
users. In IJCNLP. (pdf)
2007
- Patrick Nguyen, Jianfeng Gao and
Milind Mahajan. 2007. MSRLM: a scalable language modeling toolkit.
MSR-TR-2007-144. (pdf) – (The toolkit
can be downloaded from here).
- Jianfeng Gao, Galen Andrew, Mark
Johnson and Kristina Toutanova. 2007. A comparative study of parameter
estimation methods for statistical natural language processing. In ACL.
(pdf) – (The datasets used for
the language model adaptation experiment can be downloaded from here.
The datasets for parse reranking can be downloaded from Mark Johnson’s website
in Brown University.)
- Galen Andrew and Jianfeng Gao. 2007.
Scalable training of L1-regularized
log-linear models. In ICML. (pdf) -- (source code of the algorithm
can be downloaded here.)
- Ken Church, Ted Hard and Jianfeng Gao.
2007. Compressing trigram language models with Golomb coding. In EMNLP-CoNLL.
(pdf)
- Guihong Cao, Jianfeng Gao and Jian-Yun
Nie. 2007. A system to mine large-scale bilingual dictionaries from
monolingual web pages. In MT Summit
XI. (pdf)
- Guihong Cao, Jianfeng Gao, Jian-Yun
Nie and Jing Bai. 2007. From query translation to cross-language query
expansion with Markov chain models. In CIKM.
(pdf)
- Jianfeng Gao and Hisami Suzuki.
Foundations of statistical natural language processing: a case study of
text input system. Tutorial in Weihai
MSRA-HIT NLP Summer School. (ppt)
2006
- Jianfeng Gao and Jian-Yun Nie. 2006.
Study of Statistical Models for Query Translation: Finding a Good Unit of
Translation. In SIGIR. (pdf)
- Jianfeng Gao, Jian-Yun Nie and Ming
Zhou. 2006. Statistical Query Translation Models for Cross Language
Information Retrieval. ACM Trans on Asian Language Information
Processing, 5(4): 323-359. (draft version)
- Jianfeng Gao, Hisami Suzuki and Wei
Yuan. 2006. An Empirical Study on Language Model Adaptation. ACM Trans
on Asian Language Information Processing, 5(3): 207-227. (draft version)
- Jianfeng Gao, Hisami Suzuki and Bin
Yu. 2006. Approximation Lasso Methods for Language Modeling. In COLING-ACL.
(pdf)
- Lei Shi, Cheng Nie, Ming Zhou and
Jianfeng Gao. 2006. A DOM Tree Alignment Model for Mining Parallel Data
from the Web. In COLING-ACL 2006. (pdf)
- Zhengyu Zhou, Jianfeng Gao, Frank K
Soong and Helen Meng. 2006. A Comparative Study of Discriminative Methods
for Reranking LVCSR N-best Hypotheses in Domain Adaptation and
Generalization. In ICASSP 2006. (ps)
- Chin-Yew Lin, Guihong Cao, Jianfeng
Gao and Jian-Yun Nie. 2006. An Information-Theoretic Approach to Automatic
Evaluation of Summaries. In HLT-NAACL 2006. (pdf)
- Yi Zhang, Ke Wu, Jianfeng Gao and
Philip Vines. 2006. Automatic Acquisition of Chinese-English Parallel
Corpus from the Web. In ECIR 2006. (pdf)
2005
- Jianfeng Gao, Mu Li, Andi Wu and Chang-Ning
Huang. Chinese word segmentation and named entity recognition: a
pragmatic approach. Computational Linguistics, 31(4). (draft
version)
- Jianfeng Gao, Hao Yu, Wei Yuan and Peng
Xu. Minimum sample risk methods for language modeling. In HLT/EMNLP
2005. (pdf)
- Hisami Suzuki and Jianfeng Gao. A
comparative study on language model adaptation using new evaluation
metrics. In HLT/EMNLP 2005.(pdf)
- Jianfeng Gao, Haoliang Qin, Xinsong
Xiao and Jian-Yun Nie. Linear discriminative model for information
retrieval. In SIGIR 2005. (pdf)
- Xiaojun Wan, Jianfeng Gao, Mu Li and
Binggong Ding. Person resolution in person search results: WebHawk. In CIKM
2005. (pdf)
- Wei Yuan, Jianfeng Gao and Hisami
Suzuki. An empirical study on language model adaptation using a metric of
domain similarity. In IJCNLP 2005. (pdf)
2004
·
Jianfeng Gao, Mu Li, Andi Wu and Chang-Ning
Huang. Chinese word segmentation and named entity recognition: a
pragmatic approach. Microsoft Research Technical Report, MSR-TR-2004-123. (pdf)
·
Jianfeng Gao and Chin-Yue Lin. Introduction
to the special issue on statistical language modeling. ACM
Transactions on Asian Language Information Processing, Vol.
3, No. 2, June 2004, pp 87-93. (pdf)
- Jianfeng Gao, Jian-Yun Nie, Guangyuan
Wu and Guihong Cao. 2004b. Dependence language model for information
retrieval. In SIGIR-2004. Sheffield, UK, July 25-29. (pdf)
- Jianfeng Gao, Andi Wu, Mu Li,
Chang-Ning Huang, Hongqiao Li, Xinsong Xia and Haowei Qin. 2004c. Adaptive
Chinese word segmentation. In ACL-2004. Barcelona, July 21-26. (pdf)
- Jianfeng Gao and Hisami Suzuki. 2004.
Capturing long distance dependency for language modeling: an empirical
study. In IJCNLP-04. Sanya City, Hainan Island, China, March 22-24.
(pdf)
- Hongqiao Li, Chang-Ning Huang,
Jianfeng Gao and Xiaozhong Fan, 2004. The use of SVM for Chinese new word
identification. In IJCNLP-04. Sanya City, Hainan Island, China,
March 22-24. (pdf)
- Qiang Yang, Charles X. Ling and
Jianfeng Gao. 2004. Mining web logs for actionable knowledge. Book chapter
in Ning Zhong and Jiming Liu, editors, Intelligent Technologies for
Information Analysis. Springer, 2004. (draft
version)
2003
- Jianfeng Gao, Mu Li and Chang-Ning Huang.
2003. Improved source-channel models for Chinese word segmentation. In ACL-2003.
Sapporo, Japan, 7-12, July, 2003. (pdf)
- Jianfeng Gao and Hisami Suzuki. 2003.
Unsupervised learning of dependency structure for language modeling. In ACL-2003.
Sapporo, Japan, 7-12, July, 2003. (pdf)
2002
- Charles X Ling, Jianfeng Gao, Huajie
Zhang, Weining Qian, and Hongjiang Zhang. 2002. Improving Encarta Search
Engine Performance by Mining User Logs. International Journal of
Pattern Recognition and Artificial Intelligence. Vol. 16, No. 8. 2002.
(pdf)
- Jianfeng Gao, Jian-Yun Nie, Hongzhao
He, Weijun Chen, Ming Zhou. Resolving query translation ambiguity using a
decaying co-occurrence model and syntactic dependency relations.
In: Conference on Research and Development in Information Retrieval,
ACM SIGIR'02, Tampere, Finland, 11-15 August
2002. (pdf)
- Hongzhao He and Jianfeng Gao. NTCIR-3
CLIR experiments at MSRA. In: NTCIR-3. October 8-10. Tokyo, Japan.
(pdf)
- Jian Sun, Jianfeng Gao, Lei Zhang,
Ming Zhou, and Changning Huang. Chinese named entity identification using
class-based language model. In: COLING 2002. Taipei, Taiwan,
August 24-25, 2002. (pdf)
- Jianfeng Gao, Hisami Suzuki, Yang Wen.
Exploiting headword dependency and predictive clustering for language
modeling. EMNLP2002, University
of Pennsylvania, Philadelphia, PA,
USA. July
6-7, 2002. (pdf)
- Jianfeng Gao, Joshua Goodman, Guihong
Cao, Hang Li. Exploring asymmetric clustering for statistical language
modeling. ACL2002,University
of Pennsylvania, Philadelphia, PA,
USA. July
6-12, 2002. (pdf)
- Jianfeng Gao, Min Zhang. Improving
language model size reduction using better pruning criteria. ACL2002,University of Pennsylvania,
Philadelphia, PA, USA.
July 6-12, 2002. (pdf)
- Jianfeng Gao, Joshua Goodman, Mingjing
Li, Kai-Fu Lee. Toward a unified approach to statistical language modeling
for Chinese. ACM Transactions on Asian Language Information
Processing, Vol. 1, No. 1, pp 3-33. 2002. (pdf)
2001
- Jianfeng Gao, Guihong Cao, Hongzhao
He, Min Zhang, Jian-Yun Nie, Stephen Walker, and Stephen Robertson.
TREC-10 web track experiments at MSRCN. The Tenth Text Retrieval
Conference (TREC-10), 2001. (pdf)
- Jian-Yun Nie, Jianfeng Gao, Jian Zhang
and Ming Zhou. On the use of words and n-grams for Chinese information
retrieval. In IRAL 2000. (pdf)
- Jianfeng Gao, Jian-Yun Nie, Jian
Zhang, Endong Xun, Ming Zhou, and Changning Huang. Improving query
translation for CLIR using statistical Models. In:Conference on
Research and Development in Information Retrieval, ACM SIGIR’01, New
Orleans, Louisiana, USA. September 9-12, 2001. (pdf)
- Jianfeng Gao, Joshua Goodman, Jiangbo
Miao. The use of clustering techniques for language modeling – application
to Asian language s. Computational Linguistics and Chinese Language
Processing, Vol. 6, No. 1, pp 27-60.2001. (pdf)
- Jian Zhang, Jianfeng Gao, Ming Zhou,
Jiaxing Wang. Improving the effectiveness of information retrieval with
clustering and fusion. To appear in Computational Linguistics and
Chinese Language Processing, Vol. 6, No. 1, pp 109-125.2001. (pdf)
- Charles X Ling, Jianfeng Gao, Huajie
Zhang, Weining Qian, and Hongjiang Zhang. Mining generalized query
patterns from user logs. HICSS-34. The 34th Hawaii International Conference on System Sciences, Hawaii 3-6, January
2001. (pdf)
2000
- Jianfeng Gao, Jian-Yun Nie, Jian
Zhang, Endong Xun, Yi Su, Ming Zhou, and Changning Huang. TREC-9 CLIR
experiments at MSRCN. The Ninth Text Retrieval Conference (TREC-9),
2000. (pdf)
- Jianfeng Gao, and Kai-Fu Lee.
Distribution-based pruning of backoff language models. ACL-2000.
The 38th Annual Meeting of the Association for Computational Linguistics,
Hong Kong 3-6 October, 2000. (pdf)
- Ting Liu, Ming Zhou, Jianfeng Gao, and
Changning Huang. PENS: A machine-aided English writing system for Chinese
users. ACL-2000. The 38th Annual Meeting of the Association for
Computational Linguistics, Hong Kong, 3-6
October 2000. (pdf)
- Jianfeng Gao, Hai-Feng Wang, Mingjing
Li, and Kai-Fu Lee. A unified approach to statistical language modeling
for Chinese. ICASSP-2000, Istanbul,
Turkey,
June 5 - 9, 2000. (pdf)
- Jianfeng Gao, Mingjing Li, and Kai-Fu
Lee. N-gram distribution based language model adaptation. ICSLP-2000,
International Conference on Spoken Language Processing, Beijing, October 16-20, 2000. (pdf)
- Joshua Goodman and Jianfeng Gao.
Language model compression by predictive clustering. ICSLP-2000,
International Conference on Spoken Language Processing, Beijing, October 16-20, 2000. (pdf)
1999
·
Jianfeng Gao. Case and constraint: research
on intelligent CAD systems. PhD thesis, Shanghai Jiaotong
University, 1999. (in
Chinese) (chapter
6)
More information on ...

Natural Language
Processing Group Home Page
Microsoft Research Home
Page
Last updated $Date:
2007/09/05 $ by jfgao@microsoft.com