Haizhou Li, A Kumaran, Vladimir Pervouchine, and Min Zhang
This report documents the details of the Transliteration Mining Shared Task that was run as a part of the Named Entities Workshop (NEWS), an ACL 2010 workshop. The shared task featured mining of name transliterations from the paired Wikipedia titles in 5 different language pairs, specifically, between English and one of Arabic, Chinese, Hindi Russian and Tamil. Totally 4 groups that took part in this shared task participated in multiple mining tasks in different languages pairs. The methodology and the data sets used in this shared task is published in the Shared Task White Paper [Kumaran et al, 2010] We measure and report 3 metrics on the submitted results to calibrate the performance of individual systems on a commonly available Wikipedia dataset. We believe that the significant contribution of this shared task is in (i) assembling such diverse set of participants working in the area of machine transliteration, (ii) creating a comprehensive baseline performance of transliteration in various languages, using common datasets, and (iii) providing a basis for meaningful comparison and analysis of trade-offs between various algorithmic approaches used in transliteration. We believe that the results of this shared task would uncover a host of research problems, giving impetus to research in this significant area.
In the ACL 2010 Named Entities WorkShop (NEWS-2010), Uppsala, Sweden
Publisher Association for Computational Linguistics
All copyrights reserved by ACL 2007