Empirical Analysis of Predictive Algorithms for Collaborative Filtering
Jack Breese
David Heckerman
Carl Kadie
Microsoft Research
Redmond 98052-6399, WA
Author Email: breese@microsoft.com, heckerma@microsoft.com, carlk@microsoft.com
Abstract:
Collaborative filtering or recommender systems use a database about user
preferences to predict additional topics or products a new user might like. In
this paper we describe several algorithms designed for this task, including
techniques based on correlation coefficients, vector-based similarity
calculations, and statistical Bayesian methods. We compare the predictive
accuracy of the various methods in a set of representative problem domains. We
use two basic classes of evaluation metrics. The first characterizes accuracy
over a set of individual predictions in terms of average absolute deviation. The
second estimates the utility of a ranked list of suggested items. This metric
uses an estimate of the probability that a user will see a recommendation in an
ordered list. Experiments were run for datasets associated with 3 application
areas, 4 experimental protocols, and the 2 evaluation metrics for the various
algorithms. Results indicate that for a wide range of conditions, Bayesian
networks with decision trees at each node and correlation methods outperform
Bayesian-clustering and vector-similarity methods. Between correlation and
Bayesian networks, the preferred method depends on the nature of the dataset,
nature of the application (ranked versus one-by-one presentation), and the
availability of votes with which to make predictions. Other considerations
include the size of database, speed of predictions, and learning time.
Appears in Proceedings of the Fourteenth Conference on Uncertainty in
Artificial Intelligence, Madison, WI, July, 1998. Morgan Kaufmann Publisher.
(postscript, 397KB)
(zipped postscript, 97KB)