Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Everybody loves a rich cousin: An empirical study of Transliteration through Bridge Languages

Mitesh Khapra, A Kumaran, and Pushpak Bhattacharyya

Abstract

Most state of the art approaches for machine transliteration are data driven and require significant parallel names corpora between languages. As a result, developing transliteration functionality among n languages could be a resource intensive task requiring parallel names corpora in the order of n-choose-2. In this paper, we explore ways of reducing this high resource requirement by leveraging the available parallel data between subsets of the n languages, transitively; that is, transitioning through a bridge language Z, in cases where there there is no direct parallel names data available between two languages X and Y . We propose, and demonstrate in a diverse set of languages, that reasonable quality transliteration engines may be developed by such methodology. Such systems alleviate the need for O(nC2) corpora, significantly. In addition we show that the performance of such bridge transliteration systems is in par with direct transliteration systems, in practical applications, such as Cross Language Information Retrieval (CLIR) systems.

Details

Publication typeInproceedings
Published inthe 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-2010), Los Angeles, USA
PublisherAssociation for Computational Linguistics
> Publications > Everybody loves a rich cousin: An empirical study of Transliteration through Bridge Languages