On the Stability of Web Crawling and Web Search

ISAAC '08: Proceedings of the 19th International Symposium on Algorithms and Computation |

Published by Springer-Verlag

Publication

In this paper, we analyze a graph-theoretic property motivated by web crawling. We introduce a notion of stable cores, which is the set of web pages that are usually contained in the crawling buffer when the buffer size is smaller than the total number of web pages. We analyze the size of core in a random graph model based on the bounded Pareto power law distribution. We prove that a core of significant size exists for a large range of parameters 2 < α< 3 for the power law.