Crosslingual Information Retrieval System Enhanced with Transliteration Generation and Mining

K Saravanan, Raghavendra Udupa, and A Kumaran

Abstract

This report documents the participation of Mi-crosoft Research India (MSR India) in the Crosslingual Information Retrieval (CLIR) evaluation organized by the Forum for Infor-mation Retrieval Evaluation 2010 [FIRE 2010]. MSR India participated in two cros-slingual evaluation tasks, namely the Hindi-English and Tamil-English crosslingual tasks, in addition to the English-English monolingual task. Our core CLIR engine employed a lan-guage modeling based approach using query likelihood based document ranking and a probabilistic translation lexicon learned from English-Hindi and English-Tamil parallel cor-pora. In addition, we employed two specific techniques to deal with out-of-vocabulary terms in the crosslingual runs: first, generating transliterations directly or transitively, and second, mining possible transliteration equiva-lents from the documents retrieved in the first-pass. We show experimentally that each of these techniques significantly improved the overall retrieval performance of our crosslin-gual IR system. Our system, using all of the topic-description-and-narrative information, achieved the peak retrieval performance of a MAP of 0.5133 in the monolingual English-English task; in crosslingual tasks, our systems achieved a peak performance of a MAP of 0.4977 in Hindi-English and 0.4145 in the Tamil-English. The post-task analyses indicate that the mining of appropriate transliterations from the top results of the first-pass retrieval achieved enhanced the crosslingual perfor-mance of our system overall, in addition to enhancing individual performance of more queries. Our Hindi-English crosslingual re-trieval performance was nearly equal (~97%) to the English-English monolingual retrieval performance, indicating the effectiveness of our approaches to handle OOV‟s to enhance the baseline performance of our CLIR system.

Details

Publication typeArticle
Published inthe Forum for Information Retrieval Evaluation (FIRE-2010) Workshop, Kolkata, India
> Publications > Crosslingual Information Retrieval System Enhanced with Transliteration Generation and Mining