A Discriminative Lexicon Model for Complex Morphology

This paper describes successful applications of discriminative lexicon models to the statistical machine translation (SMT) systems into morphologically complex languages. We extend the previous work on discriminatively trained lexicon models to include more contextual information in making lexical selection decisions by building a single global log-linear model of translation selection. In offline experiments, we show that the use of the expanded contextual information, including morphological and syntactic features, help better predict words in three target languages with complex morphology (Bulgarian, Czech and Korean). We also show that these improved lexical prediction models make a positive impact in the end-to-end SMT scenario from English to these languages.

DiscLex_AMTA2010_camera.pdf
PDF file

In  The Ninth Conference of the Association for Machine Translation in the Americas

Publisher  Association for Computational Linguistics

Details

TypeInproceedings
> Publications > A Discriminative Lexicon Model for Complex Morphology