Engkoo is a technology for exploring and learning language, now powering the Bing Dictionary product in China. It is built primarily by mining translation knowledge from billions of web pages - using the Internet to catch language in motion. Currently Engkoo is built for Chinese users who are learning English; however the technology itself is language independent and can be extended in the future.
At a system level, Engkoo is an application platform that supports a multitude of NLP and Speech technologies such as cross language retrieval, alignment, sentence classification, statistical machine translation, text-to-speech, and phonetic search. The data set that supports this system is primarily built from mining a massive set of bilingual terms and sentences from across the web. Specifically, web pages that contain both Chinese and English are discovered and analyzed for parallelism, extracted and formulated into clear term definitions and sample sentences. This approach allows us to build the world's largest lexicon linking both Chinese and English together - at the same time covering the most up-to-date terms as captured by the net. In addition, our data set is intelligently merged with licensed data from sources including Microsoft Office and Encarta. Finally, the resulting vast, ranked, high quality composite data set is analyzed by a machine learning based classifier, allowing users to filter down sample sentences by combinable categories.
Demo Video
Channel9: Engkoo: English Learning Vertical Search from Bing and MS Research
Microsoft Product Impact
- Bing: Dictionary Vertical Search (mobile and desktop versions, China Market)
- MSN: Portal, 3G Mobile, and Messenger Dictionary / Translation (China Market)
- Office: IME 2010 English Search
- Windows: Recommended Windows 7 Gadget
Awards
- Bing Dictionary (Engkoo) won the "Most Trustworthy Product of 2011" award, China Internet Industry Annual Meeting, Jan. 2012
- Bing Dictionary (Engkoo) won the "2011 top apps" award from Baidu, ranked #5 of thousands
- The Engkoo project won the Wall Street Journal's 2010 Asian Innovation Readers' Choice Award
- The Text-to-Speech (TTS) used in Engkoo was rated as the best in intelligibility in the international TTS contest, Blizzard Challenge 2010, in both English and Chinese
- The Machine Translation (MT) was the top Chinese-English quality ranking in the 2008 National Institute of Standards and Technology’s (NIST) Open MT evaluation series
Selected Press
- Engkoo featured in China's CCTV-2 Global Connection piece on Microsoft Asia innovation, December 13, 2011 (Chinese)
- Microsoft Uses Karaoke Feature on China's Bing Dictionary, PC World, January 24, 2011
- The Wall Street Journal's Asian Innovation Award Winners, Wall Street Journal, October 25, 2010
- Engkoo a 'Top Innovation' in Asia, Futures, October 20, 2010
- Microsoft Bing Partners With Alibaba's Taobao Search Engine, PC World, October 13, 2010
- Software Aids Language Learners, Microsoft Research News, September 27, 2010
- Microsoft Engkoo, Intelligent English to Chinese Translator, Softpedia, August 31, 2010
- Microsoft Engkoo Helps Teach English to Chinese Internet Users, Microsoft News Center, August 26, 2010
- Microsoft's Engkoo Scans the Web to Teach Itself How to Teach You Languages, Popular Science, August 4, 2010
- Microsoft's Experimental English-Chinese Dictionary Mines the Web for Data, Engadget, August 4, 2010
- Microsoft Mines Web to Hone Language Tool, Wall Street Journal, August 3, 2010
- Companies Compete for 2010 Prizes for Technological Advances in Asia, Wall Street Journal, June 29, 2010
- Replacing Google's Dictionary with Engkoo Learning Tools (English Translation), PC Pop, March 30, 2010
- How Cool is Engkoo? (English Translation), ZOL, January 8, 2010
- Microsoft Applications from China Mine the Web, Computer World, April 20, 2009
- Microsoft Teaches English Writing (English Translation), Sina, February 19, 2009
Related Publications
-
M. R. Scott, X. Liu, M. Zhou. Towards a Specialized Search Engine for Language Learners. Proceedings of the IEEE, Vol.99, No.9, pp.1462-1465, Sept. 2011
-
D. Ding, X. Jiang, M. R. Scott, M. Zhou. Tulsa: Web Search for Writing Assistance. In Proceedings of the 34th Annual ACM SIGIR Conference. SIGIR 2011
- Matthew R. Scott, Xiaohua Liu, Ming Zhou. Microsoft Engkoo Team. Engkoo: Mining the Web for Language Learning. ACL 2011
- Y. Zhang, Z. J. Yan, F. K. Soong. Cross-validation based Decision Tree Clustering for HMM-based TTS. ICASSP 2010
- Q. Zhang, F. K. Soong, Y. Qian, Z. Yan, J. Pan and Y. Yan. Improved Modeling for F0 Generation and V/U Decision in HMM-based TTS. ICASSP 2010
- Z.J. Yan, Y. Qian, F. K. Soong. Rich Context Unit Selection (RUS) Approach to High Quality TTS. ICASSP 2010
- L. Jiang, S. Yang, M. Zhou, X. Liu and Q. Zhu. Mining Bilingual Data from the Web with Adaptively Learnt Patterns. ACL 2009
- G. Jiang, C. Zhao, M. R. Scott and F. Zou. Combinable Tabs: An Interactive Method of Information Comparison using a Combinable Tabbed Document Interface. Interact 2009
- M. Li, N. Duan, D. Zhang, C. Li, M. Zhou. Collaborative Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between Decoders. ACL 2009
- Y. Qian, F.K Soong. A Multi-space Distribution (MSD) and Two-stream Tone Modeling Approach to Mandarin Speech Cognition. Speech Communication, Volume 51, Issue 12, Pages 1169-1179, 2009
- Y. Qian, L. Hui, F.K. Soong. A Cross-Language State Sharing and Mapping Approach to Bilingual (Mandarin–English) TTS. IEEE Transactions on Audio, Speech, and Language Processing, VOL. 17, NO. 6, pp.1231-1239, 2009
- D. Zhang, M. Li, N. Duan, C. Li, M. Zhou. Measure Word Generation for English-Chinese SMT Systems. ACL 2008
- C. Li, X. He, Y. Liu, N. Xi, Incremental HMM Alignment for MT System Combination. Accepted by ACL 2008
- W. Gao, C. Niu, M. Zhou, K. Wong: Joint Ranking for Multilingual Web Search. ECIR 2009: 114-125
- G. Sun, G. Cong, X. Liu, C. Lin, M. Zhou: Mining Sequential Patterns and Tree Patterns to Detect Erroneous Sentences. AAAI 2007: 925-930
- C. Li, M. Li, D. Zhang, M. Li, M. Zhou, Y. Guan: A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation. ACL 2007
- G. Sun, X. Liu, G. Cong, M. Zhou, Z. Xiong, J. Lee, C. Lin: Detecting Erroneous Sentences using Automatically Mined Sequential Patterns. ACL 2007
- J. Lee, M. Zhou, X. Liu: Detection of Non-Native Sentences Using Machine-Translated Training Data. HLT-NAACL (Short Papers) 2007: 93-96
- L. Jiang, M. Zhou, L. Chien, C. Niu: Named Entity Translation with Web Mining and Transliteration. IJCAI 2007: 1629-1634
- S. Zhao, M. Zhou, T. Liu: Learning Question Paraphrases for QA from Encarta Logs. IJCAI 2007: 1795-1801
- J. Huang, M. Zhou, D. Yang: Extracting Chatbot Knowledge from Online Discussion Forums. IJCAI 2007: 423-428
- W. Gao, C. Niu, J. Nie, M. Zhou, J. Hu, K. Wong, H. Hon: Cross-lingual query suggestion using query logs of different languages. SIGIR 2007: 463-470
Related Projects
World Expo 2010 Shanghai - 'Chinglish' Data Collection/Correction



