Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Fast Prediction of New Feature Utility

Hoyt Koepke and Mikhail Bilenko

Abstract

We study the new feature utility prediction problem: statistically testing whether adding a feature to the data representation can improve the accuracy of a current predictor. In many applications, identifying new features is the main pathway for improving performance. However, evaluating every potential feature by re-training the predictor can be costly. The paper describes an effifficient, learner-independent technique for estimating new feature utility without re-training based on the current predictor's outputs. The method is obtained by deriving a connection between loss reduction potential and the new feature's correlation with the loss gradient of the current predictor. This leads to a simple yet powerful hypothesis testing procedure, for which we prove consistency. Our theoretical analysis is accompanied by empirical evaluation on standard benchmarks and a large-scale industrial dataset.

Details

Publication typeInproceedings
Published in Proceedings of the 29th International Conference on Machine Learning (ICML-2012)
> Publications > Fast Prediction of New Feature Utility