Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Geographic Query Parsing Data Set

Geographic Query Parsing Data Set


Geographic Query Parsing

A geographic query is usually composed of three components, “what? “geo-relation?and “where? How to parse queries and extract these components from them is a key problem for geographic information retrieval (GIR).

The keywords in the “what?component indicate what users want to search; “where?indicates the geographic area users are interested in; “geo-relation?stands for the relationship between “what?and “where? For example, for a query “Restaurant in Beijing, China? “what?= “Restaurant? “where?= “Beijing, China? and “geo-relation?= “IN? For another query “Mountains in the south of United States? “what?= “Mountains? “where?= “United States? and “geo-relation?= “SOUTH-OF?

For the “what?component, we categorize it into three types, as listed below:

  • Map type, users are looking for natural points of interests, like river, beach, mountain, monuments, etc.
  • Yellow page type, users are looking for businesses or organizations, like hotels, restaurants, hospitals, etc.
  • Information type, users are looking for text information, like news, articles, blogs, etc.

For the "geo-relation" component, a list of relation types is shown in Table 1.

Table 1. Geo-relation Types

Example query

Geo-relation

Beijing

NONE

in Beijing

IN

on the Long Island

ON

of Beijing

OF

near Beijing

next to Beijing

NEAR

in or around Beijing

in and around Beijing

IN_NEAR

along the Rhine

ALONG

at Beijing University

AT

from Beijing

FROM

to Beijing

TO

within d miles of Beijing

DISTANCE

north of Beijing

in the north of Beijing

NORTH_OF

south of Beijing

in the south of Beijing

SOUTH_OF

east of Beijing

in the east of Beijing

EAST_OF

west of Beijing

in the west of Beijing

WEST_OF

northeast of Beijing

in the northeast of Beijing

NORTH_EAST_OF

northwest of Beijing

in the northwest of Beijing

NORTH_WEST_OF

southeast of Beijing

in the southeast of Beijing

SOUTH_EAST_OF

southwest of Beijing

in the southwest of Beijing

SOUTH_WEST_OF

north to Beijing

NORTH_TO

south to Beijing

SOUTH_TO

east to Beijing

EAST_TO

west to Beijing

WEST_TO

northeast to Beijing

NORTH_EAST_TO

northwest to Beijing

NORTH_WEST_TO

southeast to Beijing

SOUTH_EAST_TO

southwest to Beijing

SOUTH_WEST_TO


Data Set

800,000 queries were collected from Windows Live Search logs (http://search.live.com/). Most of them were geographical queries. A sample labeled set of 100 queries were provided as a training set. This data set has been used in the geographic query parsing task of GeoCLEF 2007.

The query set is in XML format. Each query has two attributes: and .

1

Restaurant in Beijing, China

2

Real estate in Florida

3

Mountains in the south of United States


The sample labeled set is in the following format. There are 4 more attributes: , , , and .

1

Restaurant in Beijing, China

YES

Restaurant

Yellow page

IN

Beijing, China

40.24, 116.42

2

Lottery in Florida

YES

Lottery

Information

IN

Florida

28.38, -81.75


File Download

  • Data Set: 800,000 unlabeled queries and 100 labeled queries. Note: Since GeoCLEF 2007 has finished, we will not share this data file any more.
  • Evaluation Set: 500 labeled queries [download].