Jinyu Li, Li Deng, Dong Yu, Yifan Gong, and Alex Acero
2008
In this paper, we present a new approach to HMM adaptation
that jointly compensates for additive and convolutive acoustic
distortion in environment-robust speech recognition. The hallmark
of our new approach is the use of a nonlinear, phase-sensitive
model of acoustic distortion that captures phase asynchrony
between clean speech and the mixing noise. In the first step of the
developed algorithm, both the static and dynamic portions of the
noise and channel parameters are estimated in the cepstral domain,
using the speech recognizer’s “feedback” information and the
vector-Taylor-series linearization technique on the nonlinear
phase-sensitive model. In the second step, the estimated noise and
channel parameters are used to effectively adapt the static and
dynamic portions of the HMM means and variances also using the
linearized phase-sensitive acoustic distortion model.
In the experimental evaluation using the standard Aurora 2
task, the proposed new algorithm achieves 93.3% accuracy using
the clean-trained complex HMM backend as the baseline system
for unsupervised HMM adaptation. This reaches the highest
performance number in the literature on this task with cleantrained
HMM model. The experimental results show that the phase
term, which was missing in all previous HMM-adaptation work,
contributes significantly to the achieved high recognition accuracy.
![]() PDF file |
In Proc. ICASSP
| Type | Inproceedings |