Matthieu Hebert and Larry Heck
Phonetic Class-based Speaker Verification (PCBV) is a natural refinement of the traditional single Gaussian Mixture Model (Single GMM) scheme. The aim is to accurately model the voice characteristics of a user on a per-phonetic class basis. The paper describes briefly the implementation of a representation of the voice characteristics in a hierarchy of phonetic classes. We present a framework to easily study the effect of the modeling on the PCBV. A thorough study of the effect of the modeling complexity, the amount of enrollment data and noise conditions is presented. It is shown that Phoneme-based Verification (PBV), a special case of PCBV, is the optimal modeling scheme and consistently outperforms the state-of-the-art Single GMM modeling even in noisy environments. PBV achieves 9% to 14% relative error rate reduction while cutting the speaker model size by 50% and CPU by 2/3.
|Published in||Eurospeech 2003 - Interspeech 2003|