HITS on the Web: How does it Compare?
- Marc Najork ,
- Hugo Zaragoza ,
- Mike Taylor
30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) |
Published by Association for Computing Machinery, Inc.
This paper describes a large-scale evaluation of the effectiveness of HITS in comparison with other link-based ranking algorithms, when used in combination with a state-of-the-art text retrieval algorithm exploiting anchor text. We quantified their effectiveness using three common performance measures: the mean reciprocal rank, the mean average precision, and the normalized discounted cumulative gain measurements. The evaluation is based on two large data sets: a breadth-first search crawl of 463 million web pages containing 17.6 billion hyperlinks and referencing 2.9 billion distinct URLs; and a set of 28,043 queries sampled from a query log, each query having on average 2,383 results, about 17 of which were labeled by judges. We found that HITS outperforms PageRank, but is about as effective as web-page in-degree. The same holds true when any of the link-based features are combined with the text retrieval algorithm. Finally, we studied the relationship between query specificity and the effectiveness of selected features, and found that link-based features perform better for general queries, whereas BM25F performs better for specific queries.
Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM's Digital Library -http://www.acm.org/dl/.