WSCD2013: Workshop on Web Search Click Data 2013

Workshop on Web Search Click Data, held in conjunction with WSDM 2013

Rome — February 4, 2013

Workshop Program

The workshop program includes invited talks, regular paper talks, the challenge overview talk, talks by challenge winners and talks proposed by top participants.

Participant Reports

  • 1st prize: Out of mEmory  Denis Savenkov, Dmitry Lagun, Qiaoling Liu. Emory University [report]
  • 2nd prize: Insight Pavel Kalinin. Voronezh State University [report]
  • 3rd prize: GraphLab Qiang Yan(Institute of Automation, Chinese Academy of Sciences), Xingxing Wang(Computer Network Information Center, Chinese Academy of Sciences), Heng Gao, Dongying Kong (Institute of Computing Technology, Chinese Academy of Sciences), Yangbao Lee(Chinese Academy of Sciences) [report]
  • 4th place: wangzongzaimeia Heng Gao, Yongbao Li, Qiudan Li and Daniel Zeng (also University of Arizona Tucson). State Key Laboratory of Management and Control for Complex Systems, Beijing [report]

Workshop Organizers

  • Pavel Serdyukov, Yandex
  • Nick Craswell, Microsoft
  • Georges Dupret, Yahoo!

Workshop Overview

WSCD2013 is the third workshop on Web Search Click Data, following WSCD2009 and WSCD2012. It is a forum for new research relating to Web search usage logs and for discussing desirable properties of publicly released search log datasets. For a summary of datasets used and discussed at the workshop, see the datasets summary page.

Topics of interest include but are not restricted to:

  • web mining
  • information retrieval
  • learning to rank
  • desiderata for future click data releases
  • mining semantic relationships, for example within and between the query set and document set
  • analysis and correction of biases in the data
  • clustering/grouping log data by: topic, task, geographic location, time.
  • generative models for the log events, query text and/or document text
  • other tasks which can be improved with the click data

Research relating to search logs has been hampered by the limited availability of click datasets. This workshop comes with a new click dataset based on click logs and an accompanying challenge.

For participants in the workshop, participating in the challenge is optional, and authors are invited to submit papers using this or other datasets.

Invited Speakers

  • Mounia Lalmas (Yahoo! Labs Barcelona)
    Measuring Web User Engagement: a melting pot of web analytics, focus attention, positive affect, user interest, mouse, gaze, sentimentality, saliency, etc
  • Paul Bennett (Microsoft Reseach Redmond)
    Proprietary Data in Research: Public resources and questions of reproducibility
  • Eugene Agichtein (Emory University)
    Looking Beyond the Clicks: Acquiring and Mining Searcher Examination and Interaction Data

Important Dates

  • Start of Challenge: October 23, 2012
  • Papers due: December 3, 2012 December 10, 2012 at 23:59 Hawaii Time
  • End of Challenge: December 22, 2012 at 13:00, Moscow time
  • Notification of Acceptance: January 10, 2013
  • Camera-Ready: January 17, 2013
  • Workshop: February 4, 2013

Paper Format

Submissions should present original results and new ideas. They must report original research not accepted or under submission to any journal or conference with public proceedings (previous submissions in informal workshops or as posters are allowed, but must be indicated). Submissions must be formatted according to ACM guidelines and style files http://www.acm.org/sigs/publications/proceedings-templates and can be up to 8 pages in length, including diagrams, references and appendices if any. A submitted paper must be self-contained. Submissions shorter than 8 pages, for example position/short papers of 2-4 pages, are also encouraged.

All papers will be peer-reviewed by at least three reviewers from an International Program Committee; promising papers identified will then be discussed in a meeting of PC chairs, where the final selections will be made.

Submissions

Papers must be submitted in PDF format to the paper submission Web site (http://www.easychair.org/conferences/?conf=wscd2013). PDF files must have all non-standard fonts embedded. After upload, please check the copy stored on the site. Submission that do not view or print properly may be rejected without a chance to rectify the problem. Please contact wscd2013@easychair.org for any questions.

Program Committee

  • Benjamin Piwowarski, CNRS / University Pierre et Marie Curie
  • Carlos Castillo, Qatar Computing Research Institute
  • Elad Yom-Tov, Microsoft
  • Evangelos Kanoulas, Google
  • Fabrizio Silvestri, ISTI - CNR
  • Fan Guo, Facebook
  • Jaap Kamps, University of Amsterdam
  • Jian-Yun Nie, Universite de Montreal
  • Jim Jansen, The Pennsylvania State University
  • Mark Boyd, Ebay
  • Michael Bendersky, Google
  • Mounia Lalmas, Yahoo! Labs Barcelona
  • Steve Beitzel, Telcordia Technologies
  • Tong Zhang, Rutgers

The Dataset and Challenge

A challenge is running in parallel to this workshop, using anonymized Yandex search log data. The evaluation will be purely data-driven and will be focused on predicting the user's next action, for example whether they will switch to a different search engine, from a sample of held-out sessions. Please see the challenge website: http://switchdetect.yandex.ru/en.

Note that it is not obligatory neither to participate in the Challenge, nor to use the provided dataset to submit a paper to the workshop. WSCD welcomes papers using any search logs available to the authors.

Motivation: Whereas last year's challenge considered relevance at the query-document level, this year we predict switching in user sessions. When a user deliberately switches from one engine to another in order to continue their search, this means they require additional information and perspectives. This may be because their information need requires more than one engine can provide, or because the first engine has failed. In either case, predicting switching during real user sessions is an interesting unsolved problem, and an important consideration in multi-engine information retrieval scenarios such as commercial Web search.