WSCD09: Workshop on Web Search Click Data 2009

Workshop on Web Search Click Data, held in conjunction with WSDM 2009

February 9, 2009
Barcelona, Spain

Organizers

  • Nick Craswell, Microsoft
  • Rosie Jones, Yahoo! Labs
  • Georges Dupret, Yahoo! Labs
  • Evelyne Viegas, Microsoft

Workshop Overview

This workshop is a forum for new research relating to Web search usage logs and for discussing desirable properties of publicly released search log datasets.

Topics of interest include but are not restricted to:

  • web mining
  • information retrieval
  • learning to rank
  • desiderata for future click data releases
  • mining semantic relationships, for example within and between the query set and document set
  • analysis and correction of biases in the data
  • clustering/grouping log data by: topic, task, geographic location, time.
  • generative models for the log events, query text and/or document text
  • other tasks which can be improved with the click data

Research relating to search logs has been hampered by the limited availability of click datasets. During the first phase, participants who submitted a proposal and were selected got access to the free Microsoft 2006 RFP dataset upon signing a license agreement.

Authors are invited to submit papers using this or other datasets.

Maximum Number of Participants: 40

Activities: Presentations & Posters sessions.

Submissions

Full paper submissions will be up to 8 pages of PDF, using the ACM sig-alternate template. Short papers of up to 4 pages describing work-in-progress are also welcome.

Submission URL

http://www.easychair.org/conferences/?conf=wscd09

Important Dates

  • Proposals: Wednesday, September 3, 2008
  • Response to proposals: Wednesday, September 10, 2008
  • Paper submission: Friday, December 5, 2008
  • Paper notification: Friday, December 29, 2008 (Note: Date changed)
  • Camera ready: January 5 (Note: Date changed to allow time for processing by ACM Digital Library)
  • Workshop: February 9

The registration site is now open, early registration ends January 7, 2009.

Program Committee

Lada Adamic, University of Michigan
Eytan Adar, University of Washington
Eugene Agichtein, Emory University
Steve Beitzel, Telcordia Technologies
Mark Boyd, eBay
Kevin C. Chang, University of Illinois at Urbana-Champaign
Brian D. Davison, Lehigh University
Panagiotis G. Ipeirotis, New York University
Jim Jansen, The Pennsylvania State University
Nie Jian-Yun, Université de Montréal
Tie-Yan Liu, Microsoft Research
Amélie Marian, Rutgers University
Llew Mason, Amazon.com
Craig Murray, University of Maryland
Amanda Spink, Queensland University of Technology
Tong Zhang, Rutgers University

The Shared Dataset

Based on proposals in September, some workshop participants were granted access to a shared dataset. It is a MSN Search query Log excerpt (RFP 2006 dataset):

  • 15 million queries
  • Sampled over one month
  • Queries from the US site (mostly English)

Per query attributes included:

  1. Session ID
  2. Time-stamp
  3. Query string
  4. Number of results on results page
  5. Results page number

Data per query for each result clicked:

  1. URL
  2. Associated query
  3. Position on results page
  4. Time-stamp

Due to the type of assets under consideration, the principal investigator was asked to sign a data licensing agreement before accessing the data. The terms of the license will allow for publication of results but restricts redistribution of the data and publication of detailed excerpts of the data.

ACM Logo