Yuting Liu, Tie-Yan Liu, Bin Gao, Zhiming Ma, and Hang Li
19 January 2010
This paper is concerned with a framework to compute the importance of webpages by using real browsing behaviors of Web users. In contrast, many previous approaches like PageRank compute page importance through the use of the hyperlink graph of the Web. Recently, people have realized that the hyperlink graph is incomplete and inaccurate as a data source for determining page importance, and proposed using the real behaviors of Web users instead. In this paper, we propose a formal framework to compute page importance from user behavior data (which covers some previous works as special cases). First, we use a stochastic process to model the browsing behaviors of Web users. According to the analysis on hundreds of millions of real records of user behaviors, we justify that the process is actually a continuous-time time-homogeneous Markov process, and its stationary probability distribution can be used as the measure of page importance. Second, we propose a number of ways to estimate parameters of the stochastic process from real data, which result in a group of algorithms for page importance computation (all referred to as BrowseRank). Our experimental results have shown that the proposed algorithms can outperform the baseline methods such as PageRank and TrustRank in several tasks, demonstrating the advantage of using our proposed framework.
|Published in||Information Retrieval|
All copyrights reserved by Springer 2007.