Asking for a Second Opinion: Re-Querying of Noisy Multi-Class Labels
- Jack W. Stokes ,
- Ashish Kapoor ,
- Debajyoti Ray
IEEE International Conference on Acoustics, Speech, and Signal Processing |
Published by IEEE - Institute of Electrical and Electronics Engineers
In this paper, we propose a new maximum margin-based, active learning algorithm for identifying incorrectly labeled training data. The algorithm combines a round-robin approach for investigating each class with a simple, yet effective ranking metric called maximum negative margin (MNM). Samples are given to an expert for re-evaluation to determine if they are indeed mislabeled. We also propose using five active learning metrics, including uncertainty sampling with margin sampling (USMS) and minimum margin, for the noisy label task which have previously been used in the standard active learning setting for identifying new samples to label. USMS is very competitive with maximum negative margin. In addition, we consider other information theoretic objective criteria for this new task including uncertainty sampling with entropy, query-by-committee with voting entropy, and K-nearest neighbor with voting entropy, but these consistently perform worse than MNM and USMS. The MNM noisy label active learning algorithm can be useful in several different scenarios including data cleansing as a preprocessing step before training and identifying mislabeled examples in the test set.
© IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.