Some Experiments in Mining Named Entity Transliteration Pairs from Comparable Corpora

Parallel Named Entity pairs are important resources in several NLP tasks, such as, CLIR and MT systems. Further, such pairs may also be used for training transliteration systems, if they are transliterations of each other. In this paper, we profile the performance of a mining methodology in mining parallel named entity transliteration pairs in English and an Indian language, Tamil, leveraging linguistic tools in English, and article-aligned comparable corpora in the two languages. We adopt a methodology parallel to that of [Klementiev and Roth, 2006], but we focus instead on mining parallel named entity transliteration pairs, using a well-trained linear classifier to identify transliteration pairs. We profile the performance at several operating parameters of our algorithm and present the results that show the potential of the approach in mining transliterations pairs; in addition, we uncover a host of issues that need to be resolved, for effective mining of parallel named entity transliteration pairs.

2008_MLIA_NEMining.pdf
PDF file

In  the 2nd International Workshop on Crosslingual Information Access, Hyderabad, India

Details

TypeInproceedings
> Publications > Some Experiments in Mining Named Entity Transliteration Pairs from Comparable Corpora