Share on Facebook Tweet on Twitter Share on LinkedIn Share by email

Engkoo is a linguistic technology stack that leverages the cloud, born from over a decade of cutting-edge natural language processing research, powering various Microsoft products and features.

Engkoo is developed in China and so the focus initially has been on providing technology for linguistic tasks relevant to the Chinese market, such as Chinese-English dictionary, machine translation, language learning, and now, input and writing assistance.

Engkoo Pinyin is an Input Method Editor (IME) that uniquely leverages the power of the cloud to help people input what is on their mind: it could be Chinese, English, a mix of both, and beyond just text, such as images, videos, and maps.

At a system level, Engkoo supports a multitude of NLP and Speech technologies such as cross language retrieval, alignment, sentence classification, statistical machine translation, text-to-speech, and phonetic search. The data set that supports this system is primarily built from mining a massive set of bilingual terms and sentences from across the web. Specifically, web pages that contain both Chinese and English are discovered and analyzed for parallelism, extracted and formulated into clear term definitions and sample sentences. This approach allows us to build the world's largest lexicon linking both Chinese and English together - at the same time covering the most up-to-date terms as captured by the net. In addition, our data set is intelligently merged with licensed data from sources including Microsoft Office and Encarta. Finally, the resulting vast, ranked, high quality composite data set is analyzed by a machine learning based classifier, allowing users to filter down sample sentences by combinable categories.  

Demo Videos

  1. Keynote Demo: Engkoo Pinyin Cloud-Based Multimodal IME, 21st Century Computing Conference in Tianjin, China, Oct. 2012
  2. Computer-Assisted Audiovisual Language Learning, ieeeComputerSociety, May 23, 2012
  3. Engkoo: English Learning Vertical Search from Bing and MS Research, Channel9, April 2010

Microsoft Product Impact

  1. Microsoft Research Annual Featured Technology Transfers: 2010, 2011, 2012
  2. Bing: new Dictionary Vertical Search (phone, tablet and desktop versions, China Market). Bing Search Box Pinyin IME Auto Suggest (China Market), New Client Product Bing IME (China Market)
  3. MSNPortal3G Mobile, and Messenger Dictionary / Translation (China Market)
  4. OfficeBing Dictionary (English) for Office 2013, Bing English Assistance in Office 2013 (built-in for Chinese users, but also downloadable as separate app)


  • Microsoft Research Asia Team Collaboration Award, 2013
  • Microsoft Research Asia Deployed Best Research Project of the Year Awards, 2009, 2011, 2012
  • Bing Dictionary (Engkoo) won the "Most Trustworthy Product of 2011" award, China Internet Industry Annual Meeting, Jan. 2012
  • Bing Dictionary (Engkoo) won the "2011 top apps" award from Baidu, ranked #5 of thousands
  • The Engkoo project won the Wall Street Journal's 2010 Asian Innovation Readers' Choice Award
  • The Text-to-Speech (TTS) used in Engkoo was rated as the best in intelligibility in the international TTS contest, Blizzard Challenge 2010, in both English and Chinese
  • The Machine Translation (MT) was the top Chinese-English quality ranking in the 2008 National Institute of Standards and Technology’s (NIST) Open MT evaluation series

Selected Press

  1. Engkoo Pinyin Redefines Chinese Input, Microsoft Research Feature Story, May 13, 2013

  2. The Story of Microsoft Engkoo Pinyin (in Chinese), Sina Technology, Jan. 25, 2013

  3. New Oriental School and Microsoft Collaboration on Bing Dictionary (in Chinese), ChinaByte, Jan. 11, 2013
  4. Big Data Just Ten Years Away (in Chinese), Sept. 26, 2012
  5. Review On Cloud Input 2.0: Engkoo Pinyin IME Beta, An IME That Understands YOU (in Chinese), Chip Magazine, Sept. 2012
  6. Engkoo featured in China's CCTV-2 Global Connection piece on Microsoft Asia innovation, December 13, 2011 (Chinese)
  7. Microsoft Uses Karaoke Feature on China's Bing Dictionary, PC World, January 24, 2011
  8. The Wall Street Journal's Asian Innovation Award Winners, Wall Street Journal, October 25, 2010
  9. Engkoo a 'Top Innovation' in Asia, Futures, October 20, 2010
  10. Microsoft Bing Partners With Alibaba's Taobao Search Engine, PC World, October 13, 2010
  11. Software Aids Language Learners, Microsoft Research News, September 27, 2010
  12. Microsoft Engkoo, Intelligent English to Chinese Translator, Softpedia, August 31, 2010
  13. Microsoft Engkoo Helps Teach English to Chinese Internet Users, Microsoft News Center, August 26, 2010
  14. Microsoft's Engkoo Scans the Web to Teach Itself How to Teach You Languages, Popular Science, August 4, 2010
  15. Microsoft's Experimental English-Chinese Dictionary Mines the Web for Data, Engadget, August 4, 2010
  16. Microsoft Mines Web to Hone Language Tool, Wall Street Journal, August 3, 2010
  17. Companies Compete for 2010 Prizes for Technological Advances in Asia, Wall Street Journal, June 29, 2010
  18. Replacing Google's Dictionary with Engkoo Learning Tools (English Translation), PC Pop, March 30, 2010
  19. How Cool is Engkoo? (English Translation), ZOL, January 8, 2010
  20. Microsoft Applications from China Mine the Web, Computer World, April 20, 2009
  21. Microsoft Teaches English Writing (English Translation), Sina, February 19, 2009

Related Publications

  1. M. R. Scott, X. Liu, M. Zhou. Towards a Specialized Search Engine for Language Learners. Proceedings of the IEEE, Vol.99, No.9, pp.1462-1465, Sept. 2011

  2. D. Ding, X. Jiang, M. R. Scott, M. Zhou. Tulsa: Web Search for Writing Assistance. In Proceedings of the 34th Annual ACM SIGIR Conference. SIGIR 2011

  3. Matthew R. Scott, Xiaohua Liu, Ming Zhou. Microsoft Engkoo Team. Engkoo: Mining the Web for Language Learning. ACL 2011
  4. Y. Zhang, Z. J. Yan, F. K. Soong. Cross-validation based Decision Tree Clustering for HMM-based TTS. ICASSP 2010
  5. Q. Zhang, F. K. Soong, Y. Qian, Z. Yan, J. Pan and Y. Yan. Improved Modeling for F0 Generation and V/U Decision in HMM-based TTS. ICASSP 2010
  6. Z.J. Yan, Y. Qian, F. K. Soong. Rich Context Unit Selection (RUS) Approach to High Quality TTS. ICASSP 2010
  7. L. Jiang, S. Yang, M. Zhou, X. Liu and Q. Zhu. Mining Bilingual Data from the Web with Adaptively Learnt Patterns. ACL 2009
  8. G. Jiang, C. Zhao, M. R. Scott and F. Zou. Combinable Tabs: An Interactive Method of Information Comparison using a Combinable Tabbed Document Interface. Interact 2009
  9. M. Li, N. Duan, D. Zhang, C. Li, M. Zhou. Collaborative Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between Decoders. ACL 2009
  10. Y. Qian, F.K Soong. A Multi-space Distribution (MSD) and Two-stream Tone Modeling Approach to Mandarin Speech Cognition. Speech Communication, Volume 51, Issue 12, Pages 1169-1179, 2009
  11. Y. Qian, L. Hui, F.K. Soong. A Cross-Language State Sharing and Mapping Approach to Bilingual (Mandarin–English) TTS. IEEE Transactions on Audio, Speech, and Language Processing, VOL. 17, NO. 6, pp.1231-1239, 2009
  12. D. Zhang, M. Li, N. Duan, C. Li, M. Zhou. Measure Word Generation for English-Chinese SMT Systems. ACL 2008
  13. C. Li, X. He, Y. Liu, N. Xi, Incremental HMM Alignment for MT System Combination. Accepted by ACL 2008
  14. W. Gao, C. Niu, M. Zhou, K. Wong: Joint Ranking for Multilingual Web Search. ECIR 2009: 114-125
  15. G. Sun, G. Cong, X. Liu, C. Lin, M. Zhou: Mining Sequential Patterns and Tree Patterns to Detect Erroneous Sentences. AAAI 2007: 925-930
  16. C. Li, M. Li, D. Zhang, M. Li, M. Zhou, Y. Guan: A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation. ACL 2007
  17. G. Sun, X. Liu, G. Cong, M. Zhou, Z. Xiong, J. Lee, C. Lin: Detecting Erroneous Sentences using Automatically Mined Sequential Patterns. ACL 2007
  18. J. Lee, M. Zhou, X. Liu: Detection of Non-Native Sentences Using Machine-Translated Training Data. HLT-NAACL (Short Papers) 2007: 93-96
  19. L. Jiang, M. Zhou, L. Chien, C. Niu: Named Entity Translation with Web Mining and Transliteration. IJCAI 2007: 1629-1634
  20. S. Zhao, M. Zhou, T. Liu: Learning Question Paraphrases for QA from Encarta Logs. IJCAI 2007: 1795-1801
  21. J. Huang, M. Zhou, D. Yang: Extracting Chatbot Knowledge from Online Discussion Forums. IJCAI 2007: 423-428
  22. W. Gao, C. Niu, J. Nie, M. Zhou, J. Hu, K. Wong, H. Hon: Cross-lingual query suggestion using query logs of different languages. SIGIR 2007: 463-470

Related Projects

World Expo 2010 Shanghai - 'Chinglish' Data Collection/Correction

Gang Chen
Gang Chen

Matt Scott
Matt Scott