Exploiting Web Search Engines to Search Structured Information

18th International World Wide Web Conference (WWW 2009) |

Published by Association for Computing Machinery, Inc.

Web search engines leverage information from structured databases to answer queries. For example, many product related queries on search engines (Amazon, Google, Yahoo, Live Search) are answered by searching underlying product databases containing names, descriptions, specifications, and reviews of products. However, these vertical search engines are “silo-ed” in that their results are independent of those from web search. This often leads to empty or incomplete results, as query terms are matched against the information in the underlying database only. In order to overcome this problem, we propose an approach that first identifies relationships between web documents and items in structured databases. This allows us to subsequently exploit results from web search engines in combination with these relationships to obtain the structured data items relevant for a much wider range of queries. We propose an architecture that implements the integrated search functionality efficiently, adding very little additional overhead to query processing and is fully integrated with the search engine architecture. We demonstrate the quality of our techniques through an extensive experimental study.