Crosslingual Location Search

  • Joseph Joy ,
  • Tobias Kellner ,
  • Udayan Khurana ,
  • A Kumaran ,
  • Vibhuti Sengar

the 31st annual international ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2008), Singapore, Singapore |

Published by Association for Computing Machinery, Inc.

Publication

Address geocoding, the process of finding the map location for a structured postal address, is a relatively well-studied problem. In this paper we consider the more general problem of crosslingual location search, where the queries are not limited to postal addresses, and the language and script used in the search query is different from the one in which the underlying data is stored. To the best of our knowledge, our system is the first crosslingual location search system that is able to geocode complex addresses. We use a statistical machine transliteration system to convert location names from the script of the query to that of the stored data. However, we show that it is not sufficient to simply feed the resulting transliterations into a monolingual geocoding system, as the ambiguity inherent in the conversion drastically expands the location search space and significantly lowers the quality of results. The strength of our approach lies in its integrated, end-toend nature: we use abstraction and fuzzy search (in the text domain) to achieve maximum coverage despite transliteration ambiguities, while applying spatial constraints (in the geographic domain) to focus only on viable interpretations of the query. Our experiments with structured and unstructured queries in a set of diverse languages and scripts (Arabic, English, Hindi and Japanese) searching for locations in different regions of the world, show full crosslingual location search accuracy at levels comparable to that of commercial monolingual systems. We achieve these levels of performance using techniques that may be applied to crosslingual searches in any language/script, and over arbitrary spatial data.