Tetsuya Sakai, Nick Craswell, Ruihua Song, Stephen Robertson, Zhicheng Dou, and Chin-Yew Lin
Traditional information retrieval research has mostly focussed on satisfying clearly specified information needs. However, in reality, queries are often ambiguous and/or underspecified. In light of this, evaluating search result diversity is beginning to receive attention. We propose simple evaluation metrics for diversified Web search results. Our presumptions are that one or more interpretations (or intents) are possible for each given query, and that graded relevance assessments are available for intent-document pairs (as opposed to query-document pairs). Our goals are (a) to retrieve documents that cover as many intents as possible; and (b) to rank documents that are highly relevant to more popular intents higher than those that are marginally relevant to less popular intents. Unlike the Intent-Aware (IA) metrics proposed by Agrawal et al., our metrics successfully avoid ignoring minor intents. Unlike α-nDCG proposed by Clarke et al., our metrics can accomodate (i) which intents are more likely than others for a given query; and (ii) graded relevance within each intent. Furthermore, unlike these existing metrics, our metrics do not require approximation, and they range between 0 and 1. Experiments with the binary-relevance Diversity Task data from the TREC 2009 Web Track suggest that our metrics corrrelate well with existing metrics but can be more intuitive. Hence, we argue that our metrics are suitable for diversity evaluation given either the intent likelihood information or per-intent graded relevance, or preferably both.
|Published in||The Third International Workshop on Evaluating Information Access (EVIA)|
|Publisher||National Institute of Informatics|
Copyright held by National Institute of Informatics