Predicting MT Quality as a Function of the Source Language

David M. Rojas; Takako Aikawa

Predicting MT Quality as a Function of the Source Language

David M. Rojas ,
Takako Aikawa

May 2006

Published by European Language Resources Association

Publication

Download BibTex

This poster is a preliminary report of our experiments for detecting semantically shifted terms between different domains for the purposes of new concept extraction. A given term in one domain may represent a different concept in another domain. In our approach, we quantify the degree of similarity of words between different domains by measuring the degree of overlap in their domain-specific semantic spaces. The domain-specific semantic spaces are defined by extracting families of syntactically similar words, i.e. words that occur in the same syntactic context. Our method does not rely on any external resources other than a syntactic parser. Yet it has the potential to extract semantically shifted terms between two different domains automatically while paying close attention to contextual information. The organization of the poster is as follows: Section 1 provides our motivation. Section 2 provides an overview of our NLP technology and explains how we extract syntactically similar words. Section 3 describes the design of our experiments and our method. Section 4 provides our observations and preliminary results. Section 5 presents some work to be done in the future and concluding remarks.

Printed / Distributed with the permission of ELRA. This paper was published within the proceedings of the LREC'2004 Conference. © 2007 ELRA - European Language Resources Association. All rights reserved.