|
Geographic Query Parsing A geographic query is usually composed of three components, “what”, “geo-relation” and “where”. How to parse queries and extract these components from them is a key problem for geographic information retrieval (GIR). The keywords in the “what” component indicate what users want to search; “where” indicates the geographic area users are interested in; “geo-relation” stands for the relationship between “what” and “where”. For example, for a query “Restaurant in Beijing, China”, “what” = “Restaurant”, “where” = “Beijing, China”, and “geo-relation” = “IN”. For another query “Mountains in the south of United States”, “what” = “Mountains”, “where” = “United States”, and “geo-relation” = “SOUTH-OF”. For the “what” component, we categorize it into three types, as listed below:
Table 1. Geo-relation Types
Data Set 800,000 queries were collected from Windows Live Search logs (http://search.live.com/). Most of them were geographical queries. A sample labeled set of 100 queries were provided as a training set. This data set has been used in the geographic query parsing task of GeoCLEF 2007. The query set is in XML format. Each query has two attributes: <QUERYNO> and <QUERY>. <QUERYNO>1</QUERYNO> <QUERY>Restaurant in Beijing, China</QUERY> <QUERYNO>2</QUERYNO> <QUERY>Real estate in Florida</QUERY> <QUERYNO>3</QUERYNO> <QUERY>Mountains in the south of United States</QUERY> The sample labeled set is in the following format. There are 4 more attributes: <LOCAL>, <WHAT>, <WHAT_TYPE>, <GEO-RELATION> and <WHERE>. <QUERYNO>1</QUERYNO> <QUERY>Restaurant in Beijing, China</QUERY> <LOCAL>YES</LOCAL> <WHAT>Restaurant</WHAT> <WHAT-TYPE> Yellow page</WHAT-TYPE> <GEO-RELATION>IN</ GEO-RELATION> <WHERE>Beijing, China</WHERE> <LAT-LONG>40.24, 116.42</LAT-LONG> <QUERYNO>2</QUERYNO> <QUERY> Lottery in Florida</QUERY> <LOCAL>YES</LOCAL> <WHAT>Lottery</WHAT> <WHAT-TYPE>Information</WHAT-TYPE> <GEO-RELATION>IN</ GEO-RELATION> <WHERE>Florida</WHERE> <LAT-LONG>28.38, -81.75</LAT-LONG> File Download
|