|
AMALGAM
Overview
Amalgam is a novel system developed in the Natural Language Processing group at Microsoft Research for sentence realization during natural language generation. Sentence realization is the process of generating (realizing) a fluent sentence from a semantic representation. From the outset, the goal of the Amalgam project has been to build a sentence realization system in a data-driven fashion using machine learning techniques.To date, we have implemented Amalgam for both German and French, with English in the works. Amalgam accepts as input a logical form graph capturing the meaning of a sentence. The logical form shown here is for the German sentence Die ODBC-Spezifikation definiert das Feld, das die Komponente bezeichnet, die die Meldung ausgegeben hat. (from MS technical manuals)
Amalgam constrains the search for a fluent sentence realization by following a linguistically informed approach that includes such component steps as labeling of phrasal projections, raising, ordering of elements within a constituent, and extraposition of relative clauses. For the above example,the following tree illustrates the transformed tree just prior to ordering.
Proceeding through these steps, Amalgam transforms the logical form into a fully articulated tree structure from which an output sentence is read.
The contexts for each linguistic operation in the process are primarily machine-learned. The promise of machine-learned approaches to sentence realization is that they can easily be adapted to new domains and ideally to new languages merely by retraining. To date we have focused our research particularly in the context of the ongoing research into machine translation at Microsoft Research NLP. Publications
Project Members
Interns:
Acknowledgments:
|