Separating Speaker and Environmental Variability Using Factored Transforms

Mike Seltzer and Alex Acero

Abstract

Two primary sources of variability that degrade accuracy in speech recognition systems are the speaker and the environment. While many algorithms for speaker or environment adaptation have been proposed to improve performance, far less attention has been paid to approaches which address for both factors. In this paper, we present a method for compensating for speaker and environmental mismatch using a cascade of CMLLR transforms. The proposed approach enables speaker transforms estimated in one environment to be effectively applied to speech from the same user in a different environment. This approach can be further improved using a new training method called speaker and environment adaptive training method. When applying speaker transforms to new environments, the proposed approach results in a 13% relative improvement over conventional CMLLR.

Details

Publication typeInproceedings
Published inInterspeech
PublisherInternational Speech Communication Association
> Publications > Separating Speaker and Environmental Variability Using Factored Transforms