Geographic Query Parsing Data Set
Geographic Query Parsing
A geographic query is usually composed of three components, “what? “geo-relation?and “where? How to parse queries and extract these components from them is a key problem for geographic information retrieval (GIR).
The keywords in the “what?component indicate what users want to search; “where?indicates the geographic area users are interested in; “geo-relation?stands for the relationship between “what?and “where? For example, for a query “Restaurant in Beijing, China? “what?= “Restaurant? “where?= “Beijing, China? and “geo-relation?= “IN? For another query “Mountains in the south of United States? “what?= “Mountains? “where?= “United States? and “geo-relation?= “SOUTH-OF?
For the “what?component, we categorize it into three types, as listed below:
- Map type, users are looking for natural points of interests, like river, beach, mountain, monuments, etc.
- Yellow page type, users are looking for businesses or organizations, like hotels, restaurants, hospitals, etc.
- Information type, users are looking for text information, like news, articles, blogs, etc.
For the "geo-relation" component, a list of relation types is shown in Table 1.
Table 1. Geo-relation Types
Example query
Geo-relation
Beijing
NONE
in Beijing
IN
on the Long Island
ON
of Beijing
OF
near Beijing
next to Beijing
NEAR
in or around Beijing
in and around Beijing
IN_NEAR
along the Rhine
ALONG
at Beijing University
AT
from Beijing
FROM
to Beijing
TO
within d miles of Beijing
DISTANCE
north of Beijing
in the north of Beijing
NORTH_OF
south of Beijing
in the south of Beijing
SOUTH_OF
east of Beijing
in the east of Beijing
EAST_OF
west of Beijing
in the west of Beijing
WEST_OF
northeast of Beijing
in the northeast of Beijing
NORTH_EAST_OF
northwest of Beijing
in the northwest of Beijing
NORTH_WEST_OF
southeast of Beijing
in the southeast of Beijing
SOUTH_EAST_OF
southwest of Beijing
in the southwest of Beijing
SOUTH_WEST_OF
north to Beijing
NORTH_TO
south to Beijing
SOUTH_TO
east to Beijing
EAST_TO
west to Beijing
WEST_TO
northeast to Beijing
NORTH_EAST_TO
northwest to Beijing
NORTH_WEST_TO
southeast to Beijing
SOUTH_EAST_TO
southwest to Beijing
SOUTH_WEST_TO
Data Set
800,000 queries were collected from Windows Live Search logs (http://search.live.com/). Most of them were geographical queries. A sample labeled set of 100 queries were provided as a training set. This data set has been used in the geographic query parsing task of GeoCLEF 2007.
The query set is in XML format. Each query has two attributes:
The sample labeled set is in the following format. There are 4 more attributes:
File Download
- Data Set: 800,000 unlabeled queries and 100 labeled queries. Note: Since GeoCLEF 2007 has finished, we will not share this data file any more.
- Evaluation Set: 500 labeled queries [download].



