Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Generating Labels from Clicks

Rakesh Agrawal, Alan Halverson, Krishnaram Kenthapadi, Nina Mishra, and Panayiotis Tsaparas


The ranking function used by search engines to order results is learned from labeled training data. Each training point is a (query, URL) pair that is labeled by a human judge who assigns a score of Perfect, Excellent, etc., depending on how well the URL matches the query. In this paper, we study whether clicks can be used to automatically generate good labels. Intuitively, documents that are clicked (resp., skipped) in aggregate can indicate relevance (resp., lack of relevance). We give a novel way of transforming clicks into weighted, directed graphs inspired by eye-tracking studies and then devise an objective function for finding cuts in these graphs that induce a good labeling. In its full generality, the problem is NP-hard, but we show that, in the case of two labels, an optimum labeling can be found in linear time. For the more general case, we propose heuristic solutions. Experiments on real click logs show that clickbased labels align with the opinion of a panel of judges, especially as the consensus of the panel grows stronger.


Publication typeInproceedings
Published inInternational Conference on Web Search and Data Mining (WSDM)
> Publications > Generating Labels from Clicks