Probabilistic Combination of Content and Links

Proceedings of SIGIR '01 |

Previous research has shown that citations and hypertext links can be usefully combined with document content to improve retrieval.  Links can be used in many ways, e.g., link topology can be used to identify important pages, anchor text can be used to augment the text of cited pages, and activation can be spread to linked pages.  This paper introduces a probabilistic model that integrates content matching and these three uses of link information in a single unified framework.  Experiments with a web collection show benefits for link information especially for general queries.