WSCD2012: Workshop on Web Search Click Data 2012

Workshop on Web Search Click Data, held in conjunction with WSDM 2012

February 12, 2012
Seattle WA, USA

Workshop Program

The workshop program is now available.

Workshop Organizers

  • Pavel Serdyukov, Yandex
  • Nick Craswell, Microsoft
  • Georges Dupret, Yahoo!
  • Additional challenge organizers: Alexey Gorodilov and Eugene Kharitonov, Yandex

Workshop Overview

WSCD2012 is the second workshop on Web Search Click Data, following WSCD2009. It is a forum for new research relating to Web search usage logs and for discussing desirable properties of publicly released search log datasets.

Topics of interest include but are not restricted to:

  • web mining
  • information retrieval
  • learning to rank
  • desiderata for future click data releases
  • mining semantic relationships, for example within and between the query set and document set
  • analysis and correction of biases in the data
  • clustering/grouping log data by: topic, task, geographic location, time.
  • generative models for the log events, query text and/or document text
  • other tasks which can be improved with the click data

Research relating to search logs has been hampered by the limited availability of click datasets. This workshop comes with a new click dataset based on click logs and an accompanying challenge to predict the relevance of documents based on clicks.

For participants in the workshop, participating in the challenge is optional, and authors are invited to submit papers using this or other datasets.

Workshop Program

The workshop program will include one or two invited talks, regular paper talks, the challenge overview talk, talks by challenge winners and talks proposed by top participants.

Important Dates

  • Start of Challenge: October 15, 2011
  • Papers due: December 5, 2011 December 12, 2011
  • End of Challenge: December 15, 2011 December 22, 2011 [ 13:00 Moscow time ]
  • Notification of Acceptance: January 10, 2012
  • Camera-Ready: January 17, 2012
  • Workshop: February 12, 2012

Paper Format

Submissions should present original results and new ideas. They must report original research not accepted or under submission to any journal or conference with public proceedings (previous submissions in informal workshops or as posters are allowed, but must be indicated). Submissions must be formatted according to ACM guidelines and style files and can be up to 8 pages in length, including diagrams, references and appendices if any. A submitted paper must be self-contained. Submissions shorter than 8 pages are encouraged.

All papers will be peer-reviewed by at least three reviewers from an International Program Committee; promising papers identified will then be discussed in a meeting of PC chairs, where the final selections will be made. Accepted papers will appear in the conference online proceedings published by the ACM Digital Library and the conference web site. Authors of accepted papers will retain proprietary rights to their work, but will be required to sign a copyright release form.


Papers must be submitted in PDF format to the paper submission Web site ( PDF files must have all non-standard fonts embedded. After upload, please check the copy stored on the site. Submission that do not view or print properly may be rejected without a chance to rectify the problem. Please contact for any questions.

Program Committee

Eugene Agichtein, Emory University
Michael Bendersky, University of Massachusetts Amherst
Carlos Castillo, Yahoo! Research
Brian D. Davison, Lehigh University
Alexey Gorodilov, Yandex
Fan Guo, Facebook
Jaap Kamps, University of Amsterdam
Evangelos Kanoulas, University of Sheffield
Eugene Kharitonov, Yandex
Lihong Li, Yahoo! Research
Benjamin Piwowarski, CNRS
Fabrizio Silvestri, Information Science and Technology Institute
Qiang Yang, Hong Kong University of Science and Technology

The Dataset and Challenge

*** For all the up-to-date details, visit the challenge website. ***

The previous WSCD workshop used Microsoft click logs. WSCD2012 will use a new Yandex click log dataset. The dataset will include user sessions, with queries, rankings and clicks. Unlike previous click datasets, it will also include relevance judgments for the ranked URLs, for the purposes of training relevance prediction models. To allay privacy concerns the user data will be fully anonymized. So, only numeric IDs of queries, sessions, and URLs will be released. The queries will be grouped only by sessions and no user IDs will be provided.

Using the new dataset, a challenge will be organized around relevance prediction. It will be a blind experiment, predicting the relevance for the set of held-out test labels.

Since it is anonymized, the Yandex dataset will not support certain styles of experiment, for example incorporating features of the document text or of the query. However, it should allow for modeling and analysis beyond just that of the official challenge. Authors are invited to submit workshop papers using the Yandex dataset, or using other datasets.

Top 3 winners will be invited to talk at the workshop and will be asked to write a detailed paper describing their approach by 20th January.

Other participants of the Challenge are encouraged to send speaker proposals that will be selected based on their rank among all teams and the originality of their approach. Speaker proposals should be sent to with the subject "WSCD Speaker Proposal" and contain a proposal of up to 500 words. The deadline for speaker proposals is 5th January 2012. The notifications of acceptance will be due on 10th January. The selected speakers will be also encouraged to write a report to be included into the workshop proceedings, but that will not be a strict requirement.