Amalgam is a novel system developed in the Natural Language Processing group at Microsoft Research for sentence realization during natural language generation. Sentence realization is the process of generating (“realizing”) a fluent sentence from a semantic representation. From the outset, the goal of the Amalgam project has been to build a sentence realization system in a data-driven fashion using machine learning techniques.
Overview
Amalgam is a novel system developed in the Natural Language Processing group at Microsoft Research for sentence realization during natural language generation. Sentence realization is the process of generating (realizing) a fluent sentence from a semantic representation. From the outset, the goal of the Amalgam project has been to build a sentence realization system in a data-driven fashion using machine learning techniques.We have implemented Amalgam for both German and French.
Amalgam accepts as input a logical form graph capturing the meaning of a sentence. The logical form shown here is for the German sentence Die ODBC-Spezifikation definiert das Feld, das die Komponente bezeichnet, die die Meldung ausgegeben hat. (from MS technical manuals)

Amalgam constrains the search for a fluent sentence realization by following a linguistically informed approach that includes such component steps as labeling of phrasal projections, raising, ordering of elements within a constituent, and extraposition of relative clauses. For the above example,the following tree illustrates the transformed tree just prior to ordering.

Proceeding through these steps, Amalgam transforms the logical form into a fully articulated tree structure from which an output sentence is read.

The contexts for each linguistic operation in the process are primarily machine-learned. The promise of machine-learned approaches to sentence realization is that they can easily be adapted to new domains and ideally to new languages merely by retraining.
Project Members
- Michael Gamon
- Eric Ringger
- Simon Corston-Oliver
- Robert Moore
Interns:
- 2001: Zhu Zhang
- 2003: David Rojas
Acknowledgments:
- Max Chickering
- Tom Reutter
- Karin Berghoefer
- the Microsoft Research NLP generation grammarians
- Eric Ringger, Michael Gamon, Robert C. Moore, David Rojas, Martine Smets, and Simon Corston-Oliver, Linguistically Informed Statistical Models of Constituent Structure for Ordering in Sentence Realization, International Conference on Computational Linguistics, August 2004
- Martine Smets, Michael Gamon, Jessie Pinkham, Tom Reutter, and Martine Pettanaro, High quality machine translation using a machine-learned sentence realization component, Association for Machine Translation in the Americas, September 2003
- Martine Smets, Michael Gamon, Simon Corston-Oliver, and Eric Ringger, French Amalgam: A machine-learned sentence realization system, Association pour le Traitement Automatique des Langues, June 2003
- Martine Smets, Michael Gamon, Simon Corston-Oliver, and Eric Ringger, The adaptation of a machine-learned sentence realization system to French, Association for Computational Linguistics, April 2003
- Michael Gamon, Eric Ringger, Robert Moore, Simon Corston-Oliver, and Zhu Zhang, Extraposition: A case study in German sentence realization, Association for Computational Linguistics, August 2002
- Michael Gamon, Eric Ringger, Simon Corston-Oliver, and Robert C. Moore, Machine-learned contexts for linguistic operations in German sentence realization, Association for Computational Linguistics, July 2002
- Simon Corston-Oliver, Michael Gamon, Eric Ringger, and Robert Moore, An overview of Amalgam: A machine-learned generation module., Association for Computational Linguistics, July 2002
- Michael Gamon, Eric Ringger, and Simon Corston-Oliver, Amalgam: A machine-learned generation module, no. MSR-TR-2002-57, June 2002
- Zhu Zhang, Michael Gamon, Simon Corston-Oliver, and Eric Ringger, Intra-sentence Punctuation Insertion in Natural Language Generation, no. MSR-TR-2002-58, May 2002
