Novel Framework of Text-independent Speaker Verification based on Utterance Transform and Iterative Cohort Modeling

Ming Liu; Huazhong Ning; Thomas S. Huang; Zhengyou Zhang

Novel Framework of Text-independent Speaker Verification based on Utterance Transform and Iterative Cohort Modeling

Ming Liu ,
Huazhong Ning ,
Thomas S. Huang ,
Zhengyou Zhang

Proceedings of the Ninth International Conference on Spoken Language Processing (Interspeech 2006 - ICSLP), Pittsburgh, Pennsylvania | September 2006

Download BibTex

A novel framework for text-independent speaker verification is proposed. The framework is based on a new interpretation of Universal Background Model. The UBM in our framework actually defines a transform which maps the variable length observation into a fixed dimensional supervector (supervector space). Each speech utterance is then mapped into a point in this supervector space. The similarity measure in this vector space is progressively refined via an iterative cohort modeling scheme. The experiments on NIST 2002 corpus show the effectiveness of this new framework. Overall the EER drops from the baseline system(with TNorm) 9:21% to final improved system(without T-Norm) 8:07%. The new framework can effectively reduce the data dependence in the final output score which is clearly indicated in the second sets of experiments. The EER after T-Norm of final system marginally increases by relatively 1:73% compared to the EER of baseline system drops 16:12% relatively after T-Norm. Also, the relative improvement of DCF after T-Norm is marginal for the final improved system (2:47%) compared to 33:68% in baseline system. It clear shows that the iterative cohort modeling effectively reduce the data dependence of the nal scores, so that T-Norm will not further improve the system performance. Also, the performance of novel frame clearly increases as the iteration grows which suggest that the framework progressively rene the similarity measure on the supervector space with the iterative cohort modeling. Index Terms: speaker verification, utterance transform, iterative cohort modeling