Large-Context Models for Large-Scale Machine Translation

Statistical machine translation systems generate their output by stitching together fragments of example translations. Two trends are fueling rapid progress in this field: more example data, and new modeling techniques that better exploit the information in the data. In particular, today’s massive data sets allow our statistical models to capture larger linguistic contexts than ever before. In this talk, I will give a tour of the three stages of a modern system: training a model, searching for translations, and selecting one. For each stage, I will highlight innovations that have enabled us to leverage the rich patterns contained in large data sets.

The first stage of translation discovers how two languages correspond to each other. Models of correspondence have historically bottomed out in word-to-word statistics. The approach I will describe centers instead on statistics over multi-word phrases, which can capture idiomatic and non-literal translation patterns. These patterns are acquired automatically using nonparametric statistical machinery that scales up naturally with the data, introducing additional context whenever there is sufficient evidence to support it.

The second stage searches for translations that are scored highly by a model. As our models grow in size and complexity with the data, so does the scale of this search problem. I will present a coarse-to-fine approach to managing this complexity, which uses simpler approximate models to guide and constrain the full-scale search. This kind of multi-pass inference is proving to be a powerful general tool for deploying language processing systems at scale.

The final stage selects a single output translation from a set of high-scoring candidates. The consensus framework I will introduce selects a translation with high agreement among the multitude of strong candidates. Theoretically, this approach unifies two distinct translation problems: selecting final outputs and combining multiple systems together. Empirically, this work has set new performance records for two of the world’s most successful large-scale, highly distributed translation systems.

Speaker Details

John DeNero is a Ph.D. candidate in the computer science division at the University of California, Berkeley. He studies statistical natural language processing, working with Professor Dan Klein. He has held summer research positions at the Information Sciences Institute at the University of Southern California and with the translation group of Google Research. He plans to graduate in May 2010. John specializes in large-scale statistical machine translation, an exciting technology area at the nexus of artificial intelligence, distributed computing, and computational linguistics. His research focuses on developing clean, model-based approaches to translation that can take advantage of web-scale data sets. He has also contributed to work on parsing and unsupervised machine learning. Please visit his website to learn more: http://www.eecs.berkeley.edu/~denero

Date:
Speakers:
John DeNero
Affiliation:
University of California, Berkeley
    • Portrait of Jeff Running

      Jeff Running