A Maximum a Posteriori Approach to Speaker Adaptation Using the Trended Hidden Markov model

R. Chengalvarayan and Li Deng

Abstract

A formulation of the maximum a posteriori (MAP)

approach to speaker adaptation is presented with use of the

trended or nonstationary-state hidden Markov model (HMM),

where the Gaussian means in each HMM state are characterized

by time-varying polynomial trend functions of the state sojourn

time. Assuming uncorrelatedness among the polynomial coefficients

in the trend functions, we have obtained analytical results

for the MAP estimates of the parameters including time-varying

means and time-invariant precisions. We have implemented a

speech recognizer based on these results in speaker adaptation

experiments using the TI46 corpora. The experimental evaluation

demonstrates that the trended HMM, with use of either the linear

or the quadratic polynomial trend function, consistently outperforms

the conventional, stationary-state HMM. The evaluation

also shows that the unadapted, speaker-independent models are

outperformed by the models adapted by the MAP procedure

under supervision with as few as a single adaptation token.

Further, adaptation of polynomial coefficients alone is shown to be

better than adapting both polynomial coefficients and precision

matrices when fewer than four adaptation tokens are used, while

the reverse is found with a greater number of adaptation tokens.

Details

Publication typeArticle
Published inIEEE Trans. on Speech and Audio Processing. Volume: 9 Issue: 5
> Publications > A Maximum a Posteriori Approach to Speaker Adaptation Using the Trended Hidden Markov model