|
Workshop on Web Search Click Data, held in conjunction with WSDM 2009
February 9, 2009
Barcelona, Spain
Organizers
- Nick Craswell, Microsoft
- Rosie Jones, Yahoo! Labs
- Georges Dupret, Yahoo! Labs
- Evelyne Viegas, Microsoft
| 9:00-9:05 | Welcome and Introductions |
| 9:05-10:00 | Invited speaker:
Alissa Cooper A Policy Perspective on Query Log Privacy-Enhancing
Techniques |
| 10:00 | Survey
and evaluation of query intent detection methods
David J. Brenes, Daniel Gayo Avello and Kilian Pérez-González |
| 10:30-11:00 | Coffee Break |
| 11:00 | Analysis of Long Queries
in a Large Scale Search Log
Michael Bendersky and Bruce Croft |
| 11:30 | Search Shortcuts Using
Click-Through Data
Ranieri Baraglia, Fidel Cacheda, Victor Carneiro, Vreixo Formoso, Raffaele
Perego and Fabrizio Silvestri |
| 12:00 | Query Suggestions
Using Query-Flow Graphs
Paolo Boldi, Francesco Bonchi, Carlos Castillo, Debora
Donato and Sebastiano Vigna |
| 12:30 | Intentional Query Suggestion:
Making User Goals More Explicit During Search
Markus Strohmaier, Mark Kröll and Christian Körner |
| 13:00 | Comparative
Analysis of Clicks and Judgments for IR Evaluation
Jaap Kamps, Marijn Koolen and Andrew Trotman |
| 13:30-15:00 | Lunch |
| 15:00-16:00 | Panel on Future of Query Log Research and Data Release |
| 16:00-17:30 | Poster Session |
| 17:30-18:00 | Coffee Break |
| 18:00 | End |
Workshop Program: Invited Speaker
Alissa Cooper
A Policy Perspective on Query Log Privacy-Enhancing Techniques
Abstract:
As popular search engines face the sometimes conflicting interests of
protecting privacy while retaining query logs for a variety of uses, numerous
technical measures have been suggested to both enhance privacy and preserve at
least a portion of the utility of query logs. This article seeks to assess
seven of these techniques against three sets of criteria: (1) how well the
technique protects privacy, (2) how well the technique preserves the utility of
the query logs, and (3) how well the technique might be implemented as a user
control. A user control is defined as a mechanism that allows individual
Internet users to choose to have the technique applied to their own query logs.
Bio:
Alissa Cooper is the Chief Computer Scientist at the Center for
Democracy and Technology. Her work focuses on a range of issues including
consumer privacy, spyware, digital copyright, network neutrality, and identity
management. She conducts research into the inner workings of common and
emerging Internet technologies, and seeks to explain complex technical concepts
in understandable terms. She has testified before Congress and the Federal Trade
Commission and writes regularly on a variety of technology policy topics.
Alissa
moved to the Washington area after completing
her Bachelor's and Master's degrees in Computer Science at Stanford University.
There her work focused on computer security issues and their policy
implications.
Workshop Program: Posters
- Tailoring Click Models to User Goals
Fan Guo, Lei Li and Christos Faloutsos
- Distinguishing Humans from Robots in Web Search Logs: Preliminary Results Using Query Rates and Intervals
Omer Duskin and Dror Feitelson
- Incremental Learning to Rank with Partially-Labeled Data
Kye-Hyeon Kim and Seungjin Choi
- Usefulness of Quality Click-through Data for Training
Craig Macdonald and Iadh Ounis
- Topic-specific Analysis of Search Queries
Judit Bar-Ilan, Zheng Zhu and Mark Levene
- Optimising Topical Query Decomposition
Marcin Sydow, Francesco Bonchi, Carlos Castillo and Debora Donato
- Using query logs and click data to create improved document descriptions
Maarten van der Heijden, Max Hinne, Wessel Kraaij, Suzan Verberne and Theo van der Weide
- Disambiguation from Web search selections
Gavin Smith, Tim Brailsford, Christoph Donner, Mark Truran, Jim Goulding and Helen Ashman
Workshop Overview
This workshop is a forum for new research relating to Web search usage logs and for discussing desirable properties of publicly released search log datasets.
Topics of interest include but are not restricted to:
- web mining
- information retrieval
- learning to rank
- desiderata for future click data releases
- mining semantic relationships, for example within and between the query set and document set
- analysis and correction of biases in the data
- clustering/grouping log data by: topic, task, geographic location, time.
- generative models for the log events, query text and/or document text
- other tasks which can be improved with the click data
Research relating to search logs has been hampered by the limited availability of click datasets. During the first phase, participants who submitted a proposal and were selected got access to the free Microsoft 2006 RFP dataset upon signing a license agreement.
Authors are invited to submit papers using this or other datasets.
Activities: Presentations & Poster session.
Important Dates
- Proposals: Wednesday, September 3, 2008
- Response to proposals: Wednesday, September 10, 2008
- Paper submission: December 5, 2008
- Paper notification: December 29, 2008 (Note: Date changed)
- Camera ready: January 5 (Note: Date changed to allow time for processing by ACM Digital Library)
- Workshop: February 9
Program Committee
Lada Adamic, University of Michigan
Eytan Adar, University of Washington
Eugene Agichtein, Emory University
Steve Beitzel, Telcordia Technologies
Mark Boyd, eBay
Brian D. Davison, Lehigh University
Panagiotis G. Ipeirotis, New York University
Jim Jansen, The Pennsylvania State University
Nie Jian-Yun, Université de Montréal
Tie-Yan Liu, Microsoft Research
Amélie Marian, Rutgers University
Llew Mason, Amazon.com
Craig Murray, University of Maryland
Amanda Spink, Queensland University of Technology
Tong Zhang, Rutgers University
The Shared Dataset
Based on proposals in September 2008, some workshop participants were granted access to a shared dataset. It is a MSN Search query Log excerpt (RFP 2006 dataset):
- 15 million queries
- Sampled over one month
- Queries from the US site (mostly English)
Per query attributes included:
- Session ID
- Time-stamp
- Query string
- Number of results on results page
- Results page number
Data per query for each result clicked:
- URL
- Associated query
- Position on results page
- Time-stamp
Due to the type of assets under consideration, the principal investigator was asked to sign a data licensing agreement before accessing the data. The terms of the license will allow for publication of results but restricts redistribution of the data and publication of detailed excerpts of the data.
|
 |