Decoder Integration and Expected BLEU Training for Recurrent Neural Network Language Models

Published by Association for Computational Linguistics

Neural network language models are often trained by optimizing likelihood, but we would prefer to optimize for a task specific metric, such as BLEU in machine translation. We show how a recurrent neural network language model can be optimized towards an expected BLEU loss instead of the usual cross-entropy criterion. Furthermore, we tackle the issue of directly integrating a recurrent network into first pass decoding under an efficient approximation. Our best results improve a phrase based statistical machine translation system trained onWMT2012 French-English data by up to 2.0 BLEU, and the expected BLEU objective improves over a cross entropy trained model by up to 0.6 BLEU in a single reference setup.