Measuring the Search Effectiveness of a Breadth-First Crawl

Dennis Fetterly, Nick Craswell, and Vishwa Vinay

Abstract

Previous scalability experiments found that early precision improves as collection size increases. However, that was under the assumption that a collection's documents are all sampled with uniform probability from the same population. We contrast this to a large breadth-first web crawl, an important scenario in real-world Web search, where the early documents have quite different characteristics from the later documents. Having observed that NDCG@100 (measured over a set of reference queries) begins to plateau in the initial stages of the crawl, we investigate a number of possible reasons for this behaviour. These include the web-pages themselves, the metric used to measure retrieval effectiveness as well as the set of relevance judgements used.

Details

Publication typeInproceedings
Published inProceedings of the 31st European Conference on Information Retrieval (ECIR)
PublisherSpringer Verlag
> Publications > Measuring the Search Effectiveness of a Breadth-First Crawl