Impact of controlled language on translation quality and post-editing in a statistical machine translation environment

  • Takako Aikawa ,
  • Lee Schwartz ,
  • Ronit King ,
  • Mo Corston-Oliver ,
  • Carmen Lozano

Published by European Association for Machine Translation

Publication

This paper investigates the relationships among controlled language (CL), machine translation (MT) quality, and post-editing (PE). Previous research has shown that the use of CL improves the quality of MT. By extension, we assume that the use of CL will lead to greater productivity or reduced PE effort. The paper examines whether this three-way relationship among CL, MT quality, and PE holds. Beginning with a set of CL rules, we determine what types of CL rules have the greatest cross-linguistic impact on MT quality. We create two sets of English data, one which violates the CL rules and the other which conforms to them. We translate both sets of sentences into four typologically different languages (Dutch, Chinese, Arabic, and French) using MSR-MT, a statistical machine translation system developed at Microsoft. We measure the degree of impact of CL rules on MT quality based on the difference in human evaluation as well as BLEU scores between the two sets of MT output. Finally, we examine whether the use of CL improves productivity in terms of reduced PE effort, using character-based edit-distance.

Publication Downloads

Impact of Controlled Language on Machine-Translation Quality and Post-Editing Efforts

September 5, 2007

Results from experiments conducted by Microsoft Research’s Machine Translation Incubation Team to investigate the impact of using good English (controlled language) on post-editing productivity—as well as on the overall quality of our statistical machine-translation system.