Mining named entity transliteration equivalents from comparable corpora

  • Raghavendra Udupa ,
  • K. Saravanan ,
  • A. Kumaran ,
  • Jagadeesh Jagarlamudi

17th ACM Conference on Information and Knowledge Management (CIKM 2008) |

Named Entities (NEs) form a significant fraction of query terms in Information Retrieval (IR) systems and have a substantial impact on their retrieval performance. NEs are even more important in Cross Language Information Retrieval (CLIR), as in addition to being a significant component of query terms, any errors in their translations act as noise affecting adversely the retrieval performance (Mandl and Womser-Hacker, 2005, Xu and Weischedel, 2005).  From the resource side for CLIR, bilingual dictionaries typically offer only limited support as they do not have sufficient coverage of NEs, as new NEs are introduced to the vocabulary of a language every day. On the other hand, machine transliteration systems often produce misspelled or incorrect transliterations affecting the CLIR retrieval performance.