Daniel Povey and Kaisheng Yao
Constrained Maximum Likelihood Linear Regression (CMLLR) is a widely used speaker adaptation technique in which an affine transform of the features is estimated for each speaker. However, when the amount of speech data available is very small (e.g. a few seconds), it can be difficult to get sufficiently accurate estimates of the transform parameters. In this paper we describe a method of estimating CMLLR robustly from less data. We do this by representing the CMLLR transform matrix as a weighted sum over basis matrices, where the basis is constructed in such a way that the most important variation is concentrated in the leading coefficients. Depending on the amount of data available, we can choose to estimate a smaller or larger number of coefficients.