Minwoo Jeong, Kristina Toutanova, Hisami Suzuki, and Chris Quirk
1 November 2010
This paper describes successful applications of discriminative lexicon models to the statistical machine translation (SMT) systems into morphologically complex languages. We extend the previous work on discriminatively trained lexicon models to include more contextual information in making lexical selection decisions by building a single global log-linear model of translation selection. In offline experiments, we show that the use of the expanded contextual information, including morphological and syntactic features, help better predict words in three target languages with complex morphology (Bulgarian, Czech and Korean). We also show that these improved lexical prediction models make a positive impact in the end-to-end SMT scenario from English to these languages.
In The Ninth Conference of the Association for Machine Translation in the Americas
Publisher Association for Computational Linguistics