Asking for a Second Opinion: Re-Querying of Noisy Multi-Class Labels

IEEE International Conference on Acoustics, Speech, and Signal Processing |

Published by IEEE - Institute of Electrical and Electronics Engineers

Publication

In this paper, we propose a new maximum margin-based, active learning algorithm for identifying incorrectly labeled training data. The algorithm combines a round-robin approach for investigating each class with a simple, yet effective ranking metric called maximum negative margin (MNM). Samples are given to an expert for re-evaluation to determine if they are indeed mislabeled. We also propose using five active learning metrics, including uncertainty sampling with margin sampling (USMS) and minimum margin, for the noisy label task which have previously been used in the standard active learning setting for identifying new samples to label. USMS is very competitive with maximum negative margin. In addition, we consider other information theoretic objective criteria for this new task including uncertainty sampling with entropy, query-by-committee with voting entropy, and K-nearest neighbor with voting entropy, but these consistently perform worse than MNM and USMS. The MNM noisy label active learning algorithm can be useful in several different scenarios including data cleansing as a preprocessing step before training and identifying mislabeled examples in the test set.