Sabato Marco Siniscalchi, Jinyu Li, and Chin-Hui Lee
Model adaptation techniques are an efficient way to
reduce the mismatch that typically occurs between the training
and test condition of any automatic speech recognition (ASR) system.
This work addresses the problem of increased degradation
in performance when moving from speaker-dependent (SD) to
speaker-independent (SI) conditions for connectionist (or hybrid)
hidden Markov model/artificial neural network (HMM/ANN)
systems in the context of large vocabulary continuous speech
recognition (LVCSR). Adapting hybrid HMM/ANN systems on
a small amount of adaptation data has been proven to be a
difficult task, and has been a limiting factor in the widespread
deployment of hybrid techniques in operational ASR systems.
Addressing the crucial issue of speaker adaptation (SA) for
hybrid HMM/ANN system can thereby have a great impact
on the connectionist paradigm, which will play a major role
in the design of next-generation LVCSR considering the great
success reported by deep neural networks – ANNs with many
hidden layers that adopts the pre-training technique – on many
speech tasks. Current adaptation techniques for ANNs based on
injecting an adaptable linear transformation network connected
to either the input, or the output layer are not effective especially
with a small amount of adaptation data, e.g., a single adaptation
utterance. In this paper, a novel solution is proposed to overcome
those limits and make it robust to scarce adaptation resources.
The key idea is to adapt the hidden activation functions rather
than the network weights. The adoption of Hermitian activation
functions makes this possible. Experimental results on an LVCSR
task demonstrate the effectiveness of the proposed approach.
|Published in||IEEE Transactions on Audio, Speech, and Language Processing|