Resource Creation for Training and Testing of Transliteration Systems for Indian Languages

  • Sowmya V. B ,
  • Monojit Choudhury ,
  • ,
  • Tirthankar Dasgupta ,
  • Anupam Basu

Proceedings of the Language Resource and Evaluation Conference (LREC) 2010 |

Published by European Language Resources Association

Transliteration refers to the process of writing the text of one language using the script of another language whereby the sound of the text is preserved as far as possible (Knight and Graehl, 1998). Transliteration can be classified in to two types: forward and backward. Forward transliteration refers to the process of representation of a word (in our context, Indian language word) using a non-native script (in this case, Roman script). For example, Roman string “Sachin” might be generated by forward transliteration from the original Hindi word “सचिन” which is in the Devanagari script. Back transliteration, on the other hand, is the reverse process whereby one can obtain the native script representation back from the transliterated word. Thus, backward transliteration will generate the Devanagari string “सचिन” from the Roman string “Sachin”.