26 June 2012
Ensembles of classification and regression trees remain popular machine
learning methods because they define flexible non-parametric models that
predict well and are computationally efficient both during training and
During induction of decision trees one aims to find predicates that are
maximally informative about the prediction target.
To select good predicates most approaches estimate an information-theoretic
scoring function, the information gain, both for classification and
We point out that the common estimation procedures are biased and show that by
replacing them with improved estimators of the discrete and the differential
entropy we can obtain better decision trees.
In effect our modifications yield improved predictive performance and are
simple to implement in any decision tree code.
|Published in||ICML 2012|