Statistic Machine Translation Boosted with Spurious Word Deletion

Shujie Liu, Chi-Ho Li, and Ming Zhou

Abstract

Spurious words usually have no counterpart in other languages, and are therefore a headache in machine translation. In this paper, we propose a novel framework, skeleton-enhanced translation, in which a conventional SMT decoder can boost itself by considering the skeleton of the source input and the translation of such skeleton. By the skeleton of a sentence it is meant the sentence with its spurious words removed. We will introduce two models for identifying spurious words: one is a context- insensitive model, which removes all tokens of certain words; another is a context- sensitive model, which makes separate decision for each word token. We will also elaborate two methods to improve a translation decoder using skeleton translation: one is skeleton-enhanced re-ranking, which re-ranks the n-best output of a conventional SMT decoder with respect to a translated skeleton; another is skeleton-enhanced decoding, which re-ranks the translation hypotheses of not only the entire sentence but any span of the sentence. Our experiments show significant improvement (1.6 BLEU) over the state-of-the-art SMT performance.

Details

Publication typeInproceedings
PublisherMT-SUMMIT
> Publications > Statistic Machine Translation Boosted with Spurious Word Deletion