R. Chengalvarayan and Li Deng
July 2001
A formulation of the maximum a posteriori (MAP)
approach to speaker adaptation is presented with use of the
trended or nonstationary-state hidden Markov model (HMM),
where the Gaussian means in each HMM state are characterized
by time-varying polynomial trend functions of the state sojourn
time. Assuming uncorrelatedness among the polynomial coefficients
in the trend functions, we have obtained analytical results
for the MAP estimates of the parameters including time-varying
means and time-invariant precisions. We have implemented a
speech recognizer based on these results in speaker adaptation
experiments using the TI46 corpora. The experimental evaluation
demonstrates that the trended HMM, with use of either the linear
or the quadratic polynomial trend function, consistently outperforms
the conventional, stationary-state HMM. The evaluation
also shows that the unadapted, speaker-independent models are
outperformed by the models adapted by the MAP procedure
under supervision with as few as a single adaptation token.
Further, adaptation of polynomial coefficients alone is shown to be
better than adapting both polynomial coefficients and precision
matrices when fewer than four adaptation tokens are used, while
the reverse is found with a greater number of adaptation tokens.
![]() PDF file |
In IEEE Trans. on Speech and Audio Processing. Volume: 9 Issue: 5
| Type | Article |