Simple Evaluation Metrics for Diversified Search Results

Tetsuya Sakai, Nick Craswell, Ruihua Song, Stephen Robertson, Zhicheng Dou, and Chin-Yew Lin


Traditional information retrieval research has mostly focussed

on satisfying clearly specified information needs. However,

in reality, queries are often ambiguous and/or underspecified.

In light of this, evaluating search result diversity is

beginning to receive attention. We propose simple evaluation

metrics for diversified Web search results. Our presumptions

are that one or more interpretations (or intents)

are possible for each given query, and that graded relevance

assessments are available for intent-document pairs (as opposed

to query-document pairs). Our goals are (a) to retrieve

documents that cover as many intents as possible;

and (b) to rank documents that are highly relevant to more

popular intents higher than those that are marginally relevant

to less popular intents. Unlike the Intent-Aware (IA)

metrics proposed by Agrawal et al., our metrics successfully

avoid ignoring minor intents. Unlike α-nDCG proposed by

Clarke et al., our metrics can accomodate (i) which intents

are more likely than others for a given query; and (ii) graded

relevance within each intent. Furthermore, unlike these existing

metrics, our metrics do not require approximation,

and they range between 0 and 1. Experiments with the

binary-relevance Diversity Task data from the TREC 2009

Web Track suggest that our metrics corrrelate well with existing

metrics but can be more intuitive. Hence, we argue

that our metrics are suitable for diversity evaluation given

either the intent likelihood information or per-intent graded

relevance, or preferably both.


Publication typeInproceedings
Published inThe Third International Workshop on Evaluating Information Access (EVIA)
PublisherNational Institute of Informatics
> Publications > Simple Evaluation Metrics for Diversified Search Results