Einat Minkov, Kristina Toutanova, and Hisami Suzuki
We present a novel method for predicting inflected word forms for generating morphologically rich languages in machine translation. We utilize a rich set of syntactic and morphological knowledge sources from both source and target sentences in a probabilistic model, and evaluate their contribution in generating Russian and Arabic sentences. Our results show that the proposed model substantially outperforms the commonly used baseline of a trigram target language model; in particular, the use of morphological and syntactic features leads to large gains in prediction accuracy. We also show that the proposed method is effective with a relatively small amount of data.
|Published in||Proceedings of ACL|
|Publisher||Association for Computational Linguistics|
All copyrights reserved by ACL 2007