Takako Aikawa and Achim Ruopp
The paper explores a way to learn post-editing fixes of raw MT outputs automatically by combining two different types of statistical machine translation (SMT) systems in a linear fashion. Our proposed system (which we call a chained system) consists of two SMT sys-tems: (i) a syntax-based SMT system and (ii) a phrase-based SMT system (Koehn, 2004). We first translate source sentences of the bi-text training data into a target language, using the syntax-based SMT. This provides us the monolingual parallel data that consist of the raw MT outputs and their corresponding hu-man translations. We then build a phrase-based SMT system, using the monolingual pa-rallel corpus. Our system is thus a chain of a syntax-based SMT system and a phrase-based SMT system. The benefit of the chained sys-tem is to learn post-editing fixes automatically via a phrase-based SMT system (Simard, et al., 2007a/b). We investigated the impact from the chained system on the initial SMT system in terms of BLEU, using typologically differ-ent language pairs. The results of our experi-ments strongly indicate that the second part of the chained system can compensate the weak-nesses of the initial SMT system in a robust way by providing human-like fixes.
|Publisher||Association for Machine Translation in the Americas|