Efficient and Effective Link Analysis with Precomputed SALSA Maps

17th ACM Conference on Information and Knowledge Management (CIKM) |

Published by Association for Computing Machinery, Inc.

SALSA is a link-based ranking algorithm that takes the result set of a query as input, extends the set to include additional neighboring documents in the web graph, and performs a random walk on the induced subgraph. The stationary probability distribution of this random walk, used as a relevance score, is significantly more effective for ranking purposes than popular query-independent link-based ranking algorithms such as PageRank. nfortunately, this requires significant effort at query-time, to access the link graph and compute the stationary probability distribution. In this paper, we explore whether it is possible to perform most of the computation off-line, prior to the arrival of any queries. The off-line phase of our approach computes a “score map” for each node in the web graph by performing a SALSA-like algorithm on the neighborhood of that node and retaining the scores of the most promising nodes in the neighborhood graph. The on-line phase takes the results to a query, retrieves the score map of each result, and returns for each result a score that is the sum of the matching scores from each score map. We evaluated this algorithm on a collection of about 28,000 queries with partially labeled results, and found that it is significantly more effective than PageRank, although not quite as effective as SALSA. We also studied the trade-off between ranking effectiveness and space requirements.