Separating Speaker and Environmental Variability Using Factored Transforms

Two primary sources of variability that degrade accuracy in speech recognition systems are the speaker and the environment. While many algorithms for speaker or environment adaptation have been proposed to improve performance, far less attention has been paid to approaches which address for both factors. In this paper, we present a method for compensating for speaker and environmental mismatch using a cascade of CMLLR transforms. The proposed approach enables speaker transforms estimated in one environment to be effectively applied to speech from the same user in a different environment. This approach can be further improved using a new training method called speaker and environment adaptive training method. When applying speaker transforms to new environments, the proposed approach results in a 13% relative improvement over conventional CMLLR.

IS110041.pdf
PDF file

In  Interspeech

Publisher  International Speech Communication Association

Details

TypeInproceedings
> Publications > Separating Speaker and Environmental Variability Using Factored Transforms