Marijn Koolen, Gabriella Kazai, and Nick Craswell
A lot of the world's knowledge is stored in books, which, as a result
of recent mass-digitisation efforts, are increasingly available online.
Search engines, such as Google Books, provide mechanisms
for searchers to enter this vast knowledge space using queries as
entry points. In this paper, we view Wikipedia as a summary of
this world knowledge and aim to use this resource to guide users to
relevant books. Thus, we investigate possible ways of using Wikipedia
as an intermediary between the user's query and a collection
of books being searched. We experiment with traditional query expansion
techniques, exploiting Wikipedia articles as rich sources
of information that can augment the user's query. We then propose
a novel approach based on link distance in an extended Wikipedia
graph: we associate books with Wikipedia pages that cite these
books and use the link distance between these nodes and the pages
that match the user query as an estimation of a book's relevance to
the query. Our results show that a) classical query expansion using
terms extracted from query pages leads to increased precision, and
b) link distance between query and book pages in Wikipedia provides
a good indicator of relevance that can boost the retrieval score
of relevant books in the result ranking of a book search engine.
|Published in||Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM'09)|
|Publisher||Association for Computing Machinery, Inc.|
Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or firstname.lastname@example.org. The definitive version of this paper can be found at ACM’s Digital Library --http://www.acm.org/dl/.