Kai Yu, Xiaowei Xu, Anton Schwaighofer, Volker Tresp, and Hans-Peter Kriegel
The application range of memory-based collaborative filtering (CF) is limited due to CF's high memory consumption and long runtime. The approach presented in this paper removes redundant and inconsistent instances (users) from the data. Our work shows that a satisfactory accuracy can be achieved by using only a small portion of the original data set, thereby alleviating the storage and runtime cost of the CF algorithm. In our approach, we consider instance selection as the problem of selecting informative data that increase the a posteriori probability of the optimal model. We evaluate the empirical performance of our approach on two realworld data sets and attain very promising results. Data size and prediction time are significantly reduced, while the prediction accuracy is on a par with results achieved by using the complete database.
In Proceedings of the 11th International Conference on Information and Knowledge Management CIKM02