Learning prepositional attachment from sentence aligned bilingual corpora

Takako Aikawa, Chris Quirk, and Lee Schwartz

Abstract

Prepositional phrase attachment (PP attachment) is a major source of ambiguity in English. It poses a substantial challenge to Machine Translation (MT) between English and languages that are not characterized by PP attachment ambiguity. In this paper we present an unsupervised, bilingual, corpus-based approach to the resolution of English PP attachment ambiguity. As data we use aligned linguistic representations of the English and Japanese sentences from a large parallel corpus of technical texts. The premise of our approach is that with large aligned, parsed, bilingual (or multilingual) corpora, languages can learn non-trivial linguistic information from one another with high accuracy. We contend that our approach can be extended to linguistic phenomena other than PP attachment.

Details

Publication typeInproceedings
URLhttp://www.amtaweb.org/summit/MTSummit/FinalPapers/39-Aikawa-final.pdf
PublisherAssociation for Machine Translation in the Americas
> Publications > Learning prepositional attachment from sentence aligned bilingual corpora