Arul Menezes

Arul Menezes

Background Information

My research interests lie in using parsed representations of natural language, including dependency trees, logical form and other abstract representations to address problems in Machine Translation, Question Answering, summarization, NL dialog, command-and-control and language-based reasoning tasks. In the past few years this has meant mostly Machine Translation, with occasional hit-and-run forays into the other areas.

My current focus is leveraging syntactic structure to improve the quality of Statistical Machine Translation. In particular, I have been looking at translating from languages for which we have a parser (such as English) into an arbitrary resource-poor target language, leveraging the parsed source-language dependency tree for improved translation quality as compared to string-based MT systems.

Our Treelet Translation system, based on the above principles, is described in the publications listed below, and is available at

A version of this system was also used to translate Microsoft's support knowledge base ( into 20+ languages. Because in this case we were able to train on domain-specific data, this system produces much higher quality translations than the free web system, and our user feedback indicates that it is just as useful for solving user problems as human translations. 

Prior to joining MSR, I worked for 8 years on several Microsoft products, including Windows CE, Windows 95, MSN, Microsoft Site Server, Microsoft Commercial Internet Server, Windows 3.11 and the Microsoft At Work Fax project.