![]() |
|
|
|
Large Deviation Analysis of the Area under the ROC curveWe study generalization properties of the area under the ROC curve (AUC),
a quantity that has been advocated as an evaluation criterion for the
bipartite ranking problem. The AUC is a different term than the error rate
used for evaluation in classification problems; consequently, existing
generalization bounds for the classification error rate cannot be used to
draw conclusions about the AUC. In this paper, we define the expected
accuracy of a ranking function (analogous to the expected error rate of a
classification function), and derive distribution-free probabilistic bounds
on the deviation of the empirical AUC of a ranking function (observed on a
finite data sequence) from its expected accuracy. We derive both a large
deviation bound, which serves to bound the expected accuracy of a ranking
function in terms of its empirical AUC on a test sequence, and a uniform
convergence bound, which serves to bound the expected accuracy of a learned
ranking function in terms of its empirical AUC on a training sequence. Our
uniform convergence bound is expressed in terms of a new set of
combinatorial parameters that we term the bipartite rank-shatter
coefficients; these play the same role in our result as do the standard
VC-dimension related shatter coefficients (also known as the growth
function) in uniform convergence results for the classification error rate.
A comparison of our result with a recent uniform convergence result derived
by Freund et al. (2003) for a quantity closely related to the AUC shows that
the bound provided by our result can be considerably tighter. References
|
This site was last updated 07-07-2005