Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Amalgam

Amalgam is a novel system developed in the Natural Language Processing group at Microsoft Research for sentence realization during natural language generation. Sentence realization is the process of generating (“realizing”) a fluent sentence from a semantic representation. From the outset, the goal of the Amalgam project has been to build a sentence realization system in a data-driven fashion using machine learning techniques.


Overview

Amalgam is a novel system developed in the Natural Language Processing group at Microsoft Research for sentence realization during natural language generation. Sentence realization is the process of generating (realizing) a fluent sentence from a semantic representation. From the outset, the goal of the Amalgam project has been to build a sentence realization system in a data-driven fashion using machine learning techniques.We have implemented Amalgam for both German and French.

Amalgam accepts as input a logical form graph capturing the meaning of a sentence. The logical form shown here is for the German sentence Die ODBC-Spezifikation definiert das Feld, das die Komponente bezeichnet, die die Meldung ausgegeben hat. (from MS technical manuals)

Amalgam constrains the search for a fluent sentence realization by following a linguistically informed approach that includes such component steps as labeling of phrasal projections, raising, ordering of elements within a constituent, and extraposition of relative clauses. For the above example,the following tree illustrates the transformed tree just prior to ordering.

Proceeding through these steps, Amalgam transforms the logical form into a fully articulated tree structure from which an output sentence is read.

The contexts for each linguistic operation in the process are primarily machine-learned. The promise of machine-learned approaches to sentence realization is that they can easily be adapted to new domains and ideally to new languages merely by retraining.

Project Members

Interns:

  • 2001: Zhu Zhang
  • 2003: David Rojas

Acknowledgments:

  • Max Chickering
  • Tom Reutter
  • Karin Berghoefer
  • the Microsoft Research NLP generation grammarians
Publications