My research interests lie in using parsed representations of natural language, including dependency trees, logical form and other abstract representations to address problems in Machine Translation, Question Answering, summarization, NL dialog, command-and-control and language-based reasoning tasks. In the past few years this has meant mostly Machine Translation, with occasional hit-and-run forays into the other areas.
My current focus is leveraging syntactic structure to improve the quality of Statistical Machine Translation. In particular, I have been looking at translating from languages for which we have a parser (such as English) into an arbitrary resource-poor target language, leveraging the parsed source-language dependency tree for improved translation quality as compared to string-based MT systems.
Our Treelet Translation system, based on the above principles, is described in the publications listed below, and is available at http://www.microsofttranslator.com
A version of this system was also used to translate Microsoft's support knowledge base (http://support.microsoft.com) into 20+ languages. Because in this case we were able to train on domain-specific data, this system produces much higher quality translations than the free web system, and our user feedback indicates that it is just as useful for solving user problems as human translations.
Prior to joining MSR, I worked for 8 years on several Microsoft products, including Windows CE, Windows 95, MSN, Microsoft Site Server, Microsoft Commercial Internet Server, Windows 3.11 and the Microsoft At Work Fax project.
- Hany Hassan and Arul Menezes, Social Text Normalization using Contextual Graph Random Walks, in The 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013) , Association for Computational Linguistics, 4 August 2013
- Xiaodong He, Jianfeng Gao, Chris Quirk, Patrick Nguyen, Arul Menezes, Robert Moore, Kristina Toutanova, Mei Yang, Bill dolan, Mu Li, Chi-Ho Li, Dongdong Zhang, Long Jiang, and Ming Zhou, The MSR-MSRA MT System for NIST Open Machine Translation 2008 Evaluation, in The 2008 NIST Open Machine Translation Evaluation Workshop, 2008
- Xiaodong He, Jianfeng Gao, Chris Quirk, Patrick Nguyen, Arul Menezes, Robert Moore, Kristina Toutanova, Mei Yang, Bill dolan, Mu Li, Chi-Ho Li, Dongdong Zhang, Long Jiang, Ming Zhou, George Foster, Roland Kuhn, Jing Zheng, Wen Wang, Necip Fazil Ayan, Dimitra Vergyri, Nicolas Scheffer, and Andreas Stolcke, The MSR-NRC-SRI MT System for NIST Open Machine Translation 2008 Evaluation, in The 2008 NIST Open Machine Translation Evaluation Workshop, 2008
- Chris Quirk, Raghavendra Udupa, and Arul Menezes, Generative Models of Noisy Translations with Applications to Parallel Fragment Extraction, in Proceedings of MT Summit XI, European Association for Machine Translation, September 2007
- Rion Snow, Lucy Vanderwende, and Arul Menezes, Effectively using syntax for recognizing false entailment, Association for Computational Linguistics, May 2006
- Chris Quirk and Arul Menezes, Do we need phrases? Challenging the conventional wisdom in Statistical Machine Translation, in Proceedings of HLT-NAACL 2006, ACL/SIGPARSE, May 2006
- Xiaodong He, Arul Menezes, Chris Quirk, Anthony Aue, Simon Corston-Oliver, Jianfeng Gao, and Patrick Nguyen, Microsoft Research Treelet Translation System: NIST MT Evaluation 06, National Institute of Standards and Technology , March 2006
- Chris Quirk and Arul Menezes, Dependency Treelet Translation: The convergence of statistical and example-based machine translation?, in Machine Translation, vol. 20, pp. 43–65, March 2006
- Lucy Vanderwende, Arul Menezes, and Rion Snow, Microsoft Research at RTE-2: Syntactic Contributions in the Entailment Task: an implementation, in Proceedings of the Second PASCAL Recognising Textual Entailment Challenge Workshop, 2006
- Arul Menezes and Chris Quirk, Microsoft Research Treelet Translation System: IWSLT Evaluation, in Proceedings of the International Workshop on Spoken Language Translation, October 2005
- Lucy Vanderwende, Gary Kacmarcik, Hisami Suzuki, and Arul Menezes, MindNet: an automatically-created lexical resource, in HLT/EMNLP Interactive Demonstrations Proceedings, October 2005
- Chris Quirk, Arul Menezes, and Colin Cherry, Dependency Treelet Translation: Syntactically Informed Phrasal SMT, in Proceedings of ACL, Association for Computational Linguistics, June 2005
- 鈴木久美, Gary Kacmarcik, Lucy Vanderwende, and Arul Menezes, Mindnet/mnex: Tools for automatic construction and analysis of semantic relations database (意味関係データベースの自動構築と解析のためのツール), in 言語処理学会第11回全国大会論文集, March 2005
- Chris Quirk, Arul Menezes, and Colin Cherry, Dependency Tree Translation: Syntactically Informed Phrasal SMT, no. MSR-TR-2004-113, November 2004
- Anthony Aue, Arul Menezes, Robert Moore, Chris Quirk, and Eric Ringger, Statistical Machine Translation Using Labeled Semantic Dependency Graphs, ACL/SIGPARSE, October 2004
- Lucy Vanderwende, Michele Banko, and Arul Menezes, Event-centric summary generation, in Working notes of the Document Understanding Conference 2004, ACL, 2004
- Arul Menezes, Better contextual translation using machine learning, Springer-Verlag, October 2002
- William B. Dolan, Jessie Pinkham, Stephen D. Richardson, and Arul Menezes, Achieving commercial-quality translation with example-based methods, European Association for Machine Translation, September 2001
- William Dolan, Stephen D. Richardson, Arul Menezes, and Monica Corston-Oliver, Overcoming the customization bottleneck using example-based MT, Association for Computational Linguistics, July 2001
- Arul Menezes and Stephen D. Richardson, A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora, Association for Computational Linguistics, January 2001
- William Dolan, Stephen D. Richardson, Arul Menezes, and Monica Corston-Oliver, Overcoming the customization bottleneck using example-based MT , Workshop on Data-Driven Methods in Machine Translation, 2001