Kalika Bali, Sankaran Baskaran, and A Kumaran
In this paper, we present a detailed evaluation of a Dependency Treelet-based Phrasal Statis-tical Machine Translation (SMT) system for English-Hindi language pair. The dependency treelet-based phrasal SMT system that adds the source language syntactic informa-tion to a standard phrasal SMT has been shown to perform significantly better than sur-face based approaches on several well-studied European language pairs. We seek to examine if this observation holds true for languages as diverse as English and Hindi, by developing and testing such a system, for the first time in this language pair. We make baseline compar-isons with a standard phrasal SMT implemen-tation, and further study the effect of two radically different types of corpora, namely, technical text and general web text, on the performance of the dependency-treelet based phrasal system. The evaluation includes hu-man judgment, in addition to the two standard automated metrics, namely, BLEU and METEOR. Some language-specific issues are also highlighted that provide an insight into the challenges involved in applying standard phrasal SMT techniques for translation be-tween English and an Indic-language like Hindi.
In the 6th International Conference on Natural Language Processing (ICON-2008), Pune, India.