*
Quick Links|Home|Worldwide
Microsoft*
Search for



Geographic Query Parsing Data Set




Geographic Query Parsing

A geographic query is usually composed of three components, “what”, “geo-relation” and “where”. How to parse queries and extract these components from them is a key problem for geographic information retrieval (GIR).

The keywords in the “what” component indicate what users want to search; “where” indicates the geographic area users are interested in; “geo-relation” stands for the relationship between “what” and “where”. For example, for a query “Restaurant in Beijing, China”, “what” = “Restaurant”, “where” = “Beijing, China”, and “geo-relation” = “IN”. For another query “Mountains in the south of United States”, “what” = “Mountains”, “where” = “United States”, and “geo-relation” = “SOUTH-OF”.

For the “what” component, we categorize it into three types, as listed below:
  • Map type, users are looking for natural points of interests, like river, beach, mountain, monuments, etc.
  • Yellow page type, users are looking for businesses or organizations, like hotels, restaurants, hospitals, etc.
  • Information type, users are looking for text information, like news, articles, blogs, etc.
For the "geo-relation" component, a list of relation types is shown in Table 1.

Table 1. Geo-relation Types

Example query

Geo-relation

Beijing

NONE

in Beijing

IN

on the Long Island

ON

of Beijing

OF

near Beijing

next to Beijing

NEAR

in or around Beijing

in and around Beijing

IN_NEAR

along the Rhine

ALONG

at Beijing University

AT

from Beijing

FROM

to Beijing

TO

within d miles of Beijing

DISTANCE

north of Beijing

in the north of Beijing

NORTH_OF

south of Beijing

in the south of Beijing

SOUTH_OF

east of Beijing

in the east of Beijing

EAST_OF

west of Beijing

in the west of Beijing

WEST_OF

northeast of Beijing

in the northeast of Beijing

NORTH_EAST_OF

northwest of Beijing

in the northwest of Beijing

NORTH_WEST_OF

southeast of Beijing

in the southeast of Beijing

SOUTH_EAST_OF

southwest of Beijing

in the southwest of Beijing

SOUTH_WEST_OF

north to Beijing

NORTH_TO

south to Beijing

SOUTH_TO

east to Beijing

EAST_TO

west to Beijing

WEST_TO

northeast to Beijing

NORTH_EAST_TO

northwest to Beijing

NORTH_WEST_TO

southeast to Beijing

SOUTH_EAST_TO

southwest to Beijing

SOUTH_WEST_TO



Data Set

800,000 queries were collected from Windows Live Search logs (http://search.live.com/). Most of them were geographical queries. A sample labeled set of 100 queries were provided as a training set. This data set has been used in the geographic query parsing task of GeoCLEF 2007.

The query set is in XML format. Each query has two attributes: <QUERYNO> and <QUERY>.

<QUERYNO>1</QUERYNO>

<QUERY>Restaurant in Beijing, China</QUERY>

<QUERYNO>2</QUERYNO>

<QUERY>Real estate in Florida</QUERY>

<QUERYNO>3</QUERYNO>

<QUERY>Mountains in the south of United States</QUERY>


The sample labeled set is in the following format. There are 4 more attributes: <LOCAL>, <WHAT>, <WHAT_TYPE>, <GEO-RELATION> and <WHERE>.

<QUERYNO>1</QUERYNO>

<QUERY>Restaurant in Beijing, China</QUERY>

<LOCAL>YES</LOCAL>

<WHAT>Restaurant</WHAT>

<WHAT-TYPE> Yellow page</WHAT-TYPE>

<GEO-RELATION>IN</ GEO-RELATION>

<WHERE>Beijing, China</WHERE>

<LAT-LONG>40.24, 116.42</LAT-LONG>

<QUERYNO>2</QUERYNO>

<QUERY> Lottery in Florida</QUERY>

<LOCAL>YES</LOCAL>

<WHAT>Lottery</WHAT>

<WHAT-TYPE>Information</WHAT-TYPE>

<GEO-RELATION>IN</ GEO-RELATION>

<WHERE>Florida</WHERE>

<LAT-LONG>28.38, -81.75</LAT-LONG>



File Download

  • Data Set: 800,000 unlabeled queries and 100 labeled queries [download (zip, with password, 6.81M)]. Note: the data file has been encrypted, please contact xingx AT microsoft DOT com for password.
  • Evaluation Set: 500 labeled queries [download].

Xing Xie's home page.


©2008 Microsoft Corporation. All rights reserved. Terms of Use |Trademarks |Privacy Statement