Cost-Sensitive Decision Tree Learning for Forensic Classification

Jason V. Davis, Jungwoo Ha, Christopher J. Rossbach, Hany E. Ramadan, and Emmett Witchel


In some learning settings, the cost of acquiring features for classification

must be paid up front, before the classifier is evaluated. In this paper,

we introduce the forensic classification problem and present a new algorithm for

building decision trees that maximizes classification accuracy while minimizing

total feature costs. By expressing the ID3 decision tree algorithm in an information

theoretic context, we derive our algorithm from a well-formulated problem

objective. We evaluate our algorithm across several datasets and show that, for

a given level of accuracy, our algorithm builds cheaper trees than existing methods.

Finally, we apply our algorithm to a real-world system, CLARIFY. CLARIFY

classifies unknown or unexpected program errors by collecting statistics during

program runtime which are then used for decision tree classification after an error

has occurred. We demonstrate that if the classifier used by the CLARIFY system

is trained with our algorithm, the computational overhead (equivalently, total feature

costs) can decrease by many orders of magnitude with only a slight (< 1%)

reduction in classification accuracy.


Publication typeInproceedings
Published inECML
> Publications > Cost-Sensitive Decision Tree Learning for Forensic Classification