Xiaodong He, Amittai Axelrod, Li Deng, Alex Acero, Mei-Yuh Hwang, Alisa Nguyen, Andrew Wang, and Xiahui Huang
This paper describes the Microsoft Research (MSR) system for the evaluation campaign of the 2011 international workshop on spoken language translation. The evaluation task is to translate TED talks (www.ted.com). This task presents two unique challenges: First, the underlying topic switches sharply from talk to talk. Therefore, the translation system needs to adapt to the current topic quickly and dynamically. Second, only a very small amount of relevant parallel data (transcripts of TED talks) is available. Therefore, it is necessary to perform accurate translation model estimation with limited data. In the preparation for the evaluation, we developed two new methods to attack these problems. Specifically, we developed an unsupervised topic modeling based adaption method for machine translation models. We also developed a discriminative training method to estimate parameters in the generative components of the translation models with limited data. Experimental results show that both methods improve the translation quality. Among all the submissions, ours achieves the best BLEU score in the machine translation Chinese-to-English track (MT_CE) of the IWSLT 2011 evaluation that we participated.
Publisher Internaltional Workshop on Spoken Language Translation (IWSLT)