Larry Heck and Nikki Mirghafori
This paper presents a new approach to on-line unsupervised adaptation in speaker verification. The approach extends previous work by (1) improving performance on the enrollment handset- type when adapting on a different handset-type (e.g., improving performance on cellular when adapting on a landline office phone), (2) accomplishing this cross channel improvement without increasing the size of the speaker model after adaptation, (3) employing a count-based, parameter-dependent smoothing algorithm that emphasizes the use of mean parameters in the speaker models until sufficient adaptation data are present to accurately estimate variances, and (4) developing a new confidence-based adaptation update weight which minimizes the corrupting effects on the speaker models from impostor attacks. Experimental results were completed on a gender-balanced database of Japanese digits with 5222 speaker models across mixed channel conditions (landline and cellular). After adaptations on 8 separate phone calls with a single 8-digit utterance per call and a 12.5% impostor attack rate, the EER was reduced by 61% (rel.) using the new unsupervised adaptation approach. This compares favorably to the (optimal) 84% reduction in EER resulting from supervised adaptation.
|Published in||Proceedings of the International Conference on Spoken Language Processing (ICSLP)|