Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Some Experiments in Mining Named Entity Transliteration Pairs from Comparable Corpora

K Saravanan and A Kumaran

Abstract

Parallel Named Entity pairs are important resources in several NLP tasks, such as, CLIR and MT systems. Further, such pairs may also be used for training transliteration systems, if they are transliterations of each other. In this paper, we profile the performance of a mining methodology in mining parallel named entity transliteration pairs in English and an Indian language, Tamil, leveraging linguistic tools in English, and article-aligned comparable corpora in the two languages. We adopt a methodology parallel to that of [Klementiev and Roth, 2006], but we focus instead on mining parallel named entity transliteration pairs, using a well-trained linear classifier to identify transliteration pairs. We profile the performance at several operating parameters of our algorithm and present the results that show the potential of the approach in mining transliterations pairs; in addition, we uncover a host of issues that need to be resolved, for effective mining of parallel named entity transliteration pairs.

Details

Publication typeInproceedings
Published inthe 2nd International Workshop on Crosslingual Information Access, Hyderabad, India
> Publications > Some Experiments in Mining Named Entity Transliteration Pairs from Comparable Corpora