Resource Creation for Training and Testing of Transliteration Systems for Indian Languages

Sowmya V. B, Monojit Choudhury, Kalika Bali, Tirthankar Dasgupta, and Anupam Basu

Abstract

Transliteration refers to the process of writing the text of one language using the script of another language whereby the sound of the text is preserved as far as possible (Knight and Graehl, 1998). Transliteration can be classified in to two types: forward and backward. Forward transliteration refers to the process of representation of a word (in our context, Indian language word) using a non-native script (in this case, Roman script). For example, Roman string “Sachin” might be generated by forward transliteration from the original Hindi word “सचिन" which is in the Devanagari script. Back transliteration, on the other hand, is the reverse process whereby one can obtain the native script representation back from the transliterated word. Thus, backward transliteration will generate the Devanagari string “सचिन" from the Roman string “Sachin”.

Details

Publication typeInproceedings
Published inProceedings of the Language Resource and Evaluation Conference (LREC) 2010
URLhttp://www.lrec-conf.org/proceedings/lrec2010/pdf/182_Paper.pdf
Pages2902-2907
PublisherEuropean Language Resources Association
> Publications > Resource Creation for Training and Testing of Transliteration Systems for Indian Languages