Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
BBM: Bayesian Browsing Model from Petabyte-scale Data

Chao Liu, Fan Guo, and Christos Faloutsos

Abstract

Given a quarter of petabyte click log data, how can we esti- mate the relevance of each URL for a given query? In this paper, we propose the Bayesian Browsing Model (BBM), a new modeling technique with following advantages: (a) it does exact inference; (b) it is single-pass and parallelizable; (c) it is effective. We present two sets of experiments to test model effec- tiveness and efficiency. On the first set of over 50 million search instances of 1.1 million distinct queries, BBM out- performs the state-of-the-art competitor by 29.2% in log- likelihood while being 57 times faster. On the second click- log set, spanning a quarter of petabyte data, we showcase the scalability of BBM: we implemented it on a commercial MapReduce cluster, and it took only 3 hours to compute the relevance for 1.15 billion distinct query-URL pairs.

Details

Publication typeInproceedings
Published inKDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
PublisherAssociation for Computing Machinery, Inc.
> Publications > BBM: Bayesian Browsing Model from Petabyte-scale Data