Statistic Machine Translation Boosted with Spurious Word Deletion

Shujie Liu, Chi-Ho Li, and Ming Zhou

Abstract

Spurious words usually have no counterpart

in other languages, and are therefore a

headache in machine translation. In this

paper, we propose a novel framework,

skeleton-enhanced translation, in which a

conventional SMT decoder can boost itself

by considering the skeleton of the source

input and the translation of such skeleton.

By the skeleton of a sentence it is meant

the sentence with its spurious words removed.

We will introduce two models for

identifying spurious words: one is a context-

insensitive model, which removes all

tokens of certain words; another is a context-

sensitive model, which makes separate

decision for each word token. We will also

elaborate two methods to improve a translation

decoder using skeleton translation:

one is skeleton-enhanced re-ranking, which

re-ranks the n-best output of a conventional

SMT decoder with respect to a translated

skeleton; another is skeleton-enhanced decoding,

which re-ranks the translation hypotheses

of not only the entire sentence but

any span of the sentence. Our experiments

show significant improvement (1.6 BLEU)

over the state-of-the-art SMT performance.

Details

Publication typeInproceedings
PublisherMT-SUMMIT
> Publications > Statistic Machine Translation Boosted with Spurious Word Deletion