WSCD09: Workshop on Web Search Click Data 2009

Workshop on Web Search Click Data, held in conjunction with WSDM 2009

February 9, 2009
Barcelona, Spain


  • Nick Craswell, Microsoft
  • Rosie Jones, Yahoo! Labs
  • Georges Dupret, Yahoo! Labs
  • Evelyne Viegas, Microsoft

Workshop Program [ Full proceedings at, and video of talks at ]

9:00-9:05Welcome and Introductions
9:05-10:00Invited speaker: Alissa Cooper A Policy Perspective on Query Log Privacy-Enhancing Techniques
10:00Survey and evaluation of query intent detection methods
David J. Brenes, Daniel Gayo Avello and Kilian Pérez-González
10:30-11:00 Coffee Break
11:00Analysis of Long Queries in a Large Scale Search Log
Michael Bendersky and Bruce Croft
11:30 Search Shortcuts Using Click-Through Data
Ranieri Baraglia, Fidel Cacheda, Victor Carneiro, Vreixo Formoso, Raffaele Perego and Fabrizio Silvestri
12:00Query Suggestions Using Query-Flow Graphs
Paolo Boldi, Francesco Bonchi, Carlos Castillo, Debora Donato and Sebastiano Vigna
12:30Intentional Query Suggestion: Making User Goals More Explicit During Search
Markus Strohmaier, Mark Kröll and Christian Körner
13:00 Comparative Analysis of Clicks and Judgments for IR Evaluation
Jaap Kamps, Marijn Koolen and Andrew Trotman
13:30-15:00 Lunch
15:00-16:00Panel on Future of Query Log Research and Data Release
16:00-17:30Poster Session
17:30-18:00 Coffee Break
18:00 End

Workshop Program: Invited Speaker

Alissa Cooper
A Policy Perspective on Query Log Privacy-Enhancing Techniques

As popular search engines face the sometimes conflicting interests of protecting privacy while retaining query logs for a variety of uses, numerous technical measures have been suggested to both enhance privacy and preserve at least a portion of the utility of query logs. This article seeks to assess seven of these techniques against three sets of criteria: (1) how well the technique protects privacy, (2) how well the technique preserves the utility of the query logs, and (3) how well the technique might be implemented as a user control. A user control is defined as a mechanism that allows individual Internet users to choose to have the technique applied to their own query logs.

Alissa Cooper is the Chief Computer Scientist at the Center for Democracy and Technology. Her work focuses on a range of issues including consumer privacy, spyware, digital copyright, network neutrality, and identity management. She conducts research into the inner workings of common and emerging Internet technologies, and seeks to explain complex technical concepts in understandable terms. She has testified before Congress and the Federal Trade Commission and writes regularly on a variety of technology policy topics.
Alissa moved to the Washington area after completing her Bachelor's and Master's degrees in Computer Science at Stanford University. There her work focused on computer security issues and their policy implications.

Workshop Program: Posters

  • Tailoring Click Models to User Goals
    Fan Guo, Lei Li and Christos Faloutsos
  • Distinguishing Humans from Robots in Web Search Logs: Preliminary Results Using Query Rates and Intervals
    Omer Duskin and Dror Feitelson
  • Incremental Learning to Rank with Partially-Labeled Data
    Kye-Hyeon Kim and Seungjin Choi
  • Usefulness of Quality Click-through Data for Training
    Craig Macdonald and Iadh Ounis
  • Topic-specific Analysis of Search Queries
    Judit Bar-Ilan, Zheng Zhu and Mark Levene
  • Optimising Topical Query Decomposition
    Marcin Sydow, Francesco Bonchi, Carlos Castillo and Debora Donato
  • Using query logs and click data to create improved document descriptions
    Maarten van der Heijden, Max Hinne, Wessel Kraaij, Suzan Verberne and Theo van der Weide
  • Disambiguation from Web search selections
    Gavin Smith, Tim Brailsford, Christoph Donner, Mark Truran, Jim Goulding and Helen Ashman

Workshop Overview

This workshop is a forum for new research relating to Web search usage logs and for discussing desirable properties of publicly released search log datasets.

Topics of interest include but are not restricted to:

  • web mining
  • information retrieval
  • learning to rank
  • desiderata for future click data releases
  • mining semantic relationships, for example within and between the query set and document set
  • analysis and correction of biases in the data
  • clustering/grouping log data by: topic, task, geographic location, time.
  • generative models for the log events, query text and/or document text
  • other tasks which can be improved with the click data

Research relating to search logs has been hampered by the limited availability of click datasets. During the first phase, participants who submitted a proposal and were selected got access to the free Microsoft 2006 RFP dataset upon signing a license agreement.

Authors are invited to submit papers using this or other datasets.

Activities: Presentations & Poster session.

Important Dates

  • Proposals: Wednesday, September 3, 2008
  • Response to proposals: Wednesday, September 10, 2008
  • Paper submission: December 5, 2008
  • Paper notification: December 29, 2008 (Note: Date changed)
  • Camera ready: January 5 (Note: Date changed to allow time for processing by ACM Digital Library)
  • Workshop: February 9

Program Committee

Lada Adamic, University of Michigan
Eytan Adar, University of Washington
Eugene Agichtein, Emory University
Steve Beitzel, Telcordia Technologies
Mark Boyd, eBay
Brian D. Davison, Lehigh University
Panagiotis G. Ipeirotis, New York University
Jim Jansen, The Pennsylvania State University
Nie Jian-Yun, Université de Montréal
Tie-Yan Liu, Microsoft Research
Amélie Marian, Rutgers University
Llew Mason,
Craig Murray, University of Maryland
Amanda Spink, Queensland University of Technology
Tong Zhang, Rutgers University

The Shared Dataset

Based on proposals in September 2008, some workshop participants were granted access to a shared dataset. It is a MSN Search query Log excerpt (RFP 2006 dataset):

  • 15 million queries
  • Sampled over one month
  • Queries from the US site (mostly English)

Per query attributes included:

  1. Session ID
  2. Time-stamp
  3. Query string
  4. Number of results on results page
  5. Results page number

Data per query for each result clicked:

  1. URL
  2. Associated query
  3. Position on results page
  4. Time-stamp

Due to the type of assets under consideration, the principal investigator was asked to sign a data licensing agreement before accessing the data. The terms of the license will allow for publication of results but restricts redistribution of the data and publication of detailed excerpts of the data.

ACM Logo