Regularized Minimum Error Rate Training

Michel Galley, Chris Quirk, Colin Cherry, and Kristina Toutanova

Abstract

Minimum Error Rate Training (MERT) remains one of the preferred methods for tuning linear parameters in machine translation systems, yet it faces significant issues. First, MERT is an unregularized learner and is therefore prone to overfitting. Second, it is commonly used on a noisy, non-convex loss function that becomes more difficult to optimize as the number of parameters increases. To address these issues, we study the addition of a regularization term to the MERT objective function. Since standard regularizers such as L2 are inapplicable to MERT due to the scale invariance of its objective function, we turn to two regularizers---L0 and a modification of L2---and present methods for efficiently integrating them during search. To improve search in large parameter spaces, we also present a new direction finding algorithm that uses the gradient of expected BLEU to orient MERT's exact line searches. Experiments with up to 3600 features show that these extensions of MERT yield results comparable to PRO, a learner often used with large feature sets.

Details

Publication typeInproceedings
Published inProceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
AddressSeattle, Washington
PublisherAssociation for Computational Linguistics
> Publications > Regularized Minimum Error Rate Training