Tetsuya Sakai, Nick Craswell, Ruihua Song, Stephen Robertson, Zhicheng Dou, and Chin-Yew Lin
June 2010
Traditional information retrieval research has mostly focussed
on satisfying clearly specified information needs. However,
in reality, queries are often ambiguous and/or underspecified.
In light of this, evaluating search result diversity is
beginning to receive attention. We propose simple evaluation
metrics for diversified Web search results. Our presumptions
are that one or more interpretations (or intents)
are possible for each given query, and that graded relevance
assessments are available for intent-document pairs (as opposed
to query-document pairs). Our goals are (a) to retrieve
documents that cover as many intents as possible;
and (b) to rank documents that are highly relevant to more
popular intents higher than those that are marginally relevant
to less popular intents. Unlike the Intent-Aware (IA)
metrics proposed by Agrawal et al., our metrics successfully
avoid ignoring minor intents. Unlike α-nDCG proposed by
Clarke et al., our metrics can accomodate (i) which intents
are more likely than others for a given query; and (ii) graded
relevance within each intent. Furthermore, unlike these existing
metrics, our metrics do not require approximation,
and they range between 0 and 1. Experiments with the
binary-relevance Diversity Task data from the TREC 2009
Web Track suggest that our metrics corrrelate well with existing
metrics but can be more intuitive. Hence, we argue
that our metrics are suitable for diversity evaluation given
either the intent likelihood information or per-intent graded
relevance, or preferably both.
In The Third International Workshop on Evaluating Information Access (EVIA)
Publisher National Institute of Informatics
Copyright held by National Institute of Informatics
| Type | Inproceedings |