Improving the quality of a customized SMT system using shared training data

  • Chris Wendt ,
  • Will Lewis

MT Summit 2009 |

The nature of statistical MT systems includes the ability to relatively quickly and easily customize them to a company’s specific domain, by training it on the company’s own parallel data. We are presenting a case study of a customized statistical MT system, which has been trained with an organization’s proprietary data, and show how, and by how much, we can improve the quality of this customized system by using additional training data from trusted sources outside the organization, for instance using data that other companies and organizations have shared in the TAUS Data Association. We will show the process, the criteria, the mechanisms, and the automatic and human evaluation results for each step in the process, enabling the audience to make deliberate choices about how to enhance the composition of training data for their SMT installation.