Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Speaker Adaptation with an Exponential Transform

Daniel Povey, Geoffrey Zweig, and Alex Acero

Abstract

In this technical report we describe a linear transform that we call an Exponential Transform (ET), which integrates aspects of Constrained MLLR, VTLN and STC/MLLT into a single transform with jointly trained components. Its main advantage is that a very small number of speaker-specific parameters is required, thus enabling effective adaptation with small amounts of speaker specific data. The key part of the transform is controlled by a single speaker-specific parameter that is analogous to a VTLN warp factor. The transform has non-speaker-specific parameters that are learned from data, and we find that the axis along which male and female speakers differ is automatically learned. The exponential transform has no explicit notion of frequency warping, which makes it applicable in principle to non-standard features such as those derived from neural nets, or when the key axes may not be male-female. Based on our experiments with standard MFCC features, it appears to perform better than conventional VTLN.

Details

Publication typeTechReport
NumberMSR-TR-2011-101
PublisherMicrosoft Research
> Publications > Speaker Adaptation with an Exponential Transform