Mitesh Khapra, A Kumaran, and Pushpak Bhattacharyya
Most state of the art approaches for machine transliteration are data driven and require significant parallel names corpora between languages. As a result, developing transliteration functionality among n languages could be a resource intensive task requiring parallel names corpora in the order of n-choose-2. In this paper, we explore ways of reducing this high resource requirement by leveraging the available parallel data between subsets of the n languages, transitively; that is, transitioning through a bridge language Z, in cases where there there is no direct parallel names data available between two languages X and Y . We propose, and demonstrate in a diverse set of languages, that reasonable quality transliteration engines may be developed by such methodology. Such systems alleviate the need for O(nC2) corpora, significantly. In addition we show that the performance of such bridge transliteration systems is in par with direct transliteration systems, in practical applications, such as Cross Language Information Retrieval (CLIR) systems.
In the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-2010), Los Angeles, USA
Publisher Association for Computational Linguistics
All copyrights reserved by ACL 2007