Random Walks on the Click Graph

Proceedings of SIGIR 2007 |

\urlhttp://research.microsoft.com/users/nickcr/pubs/craswell_sigir07.pdf

Search engines can record which documents were clicked for which query, and use these query-document pairs as ‘soft’ relevance judgments. However, compared to the true judgments, click logs give noisy and sparse relevance information. We apply a Markov random walk model to a large click log, producing a probabilistic ranking of documents for a given query. A key advantage of the model is its ability to retrieve relevant documents that have not yet been clicked for that query and rank those effectively. We conduct experiments on click logs from image search, comparing our (‘backward’) random walk model to a different (‘forward’) random walk, varying parameters such as walk length and self-transition probability. The most effective combination is a long backward walk with high self-transition probability.

Publication Downloads

Bing Coronavirus Query Set

April 13, 2021

Dataset containing Aggregated and anonymized queries from across the world with Coronavirus intent. This dataset was curated from the Bing search logs (desktop users only) over the period of Jan 1st, 2020 – (Current Month - 1). Only searches that were issued many times by multiple users were included. The dataset includes queries from all over the world that had an intent related to the Coronavirus or Covid-19. In some cases this intent is explicit in the query itself, e.g. "Coronavirus updates Seattle" in other cases it is implicit , e.g. "Shelter in place". Implicit intent of search queries (e.g. Toilet paper) were extracted by using Random walks on the click graph approach as outlined in the paper by the same name. All personal data was removed.