Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems

Sabato Marco Siniscalchi, Jinyu Li, and Chin-Hui Lee

Abstract

Model adaptation techniques are an efficient way to

reduce the mismatch that typically occurs between the training

and test condition of any automatic speech recognition (ASR) system.

This work addresses the problem of increased degradation

in performance when moving from speaker-dependent (SD) to

speaker-independent (SI) conditions for connectionist (or hybrid)

hidden Markov model/artificial neural network (HMM/ANN)

systems in the context of large vocabulary continuous speech

recognition (LVCSR). Adapting hybrid HMM/ANN systems on

a small amount of adaptation data has been proven to be a

difficult task, and has been a limiting factor in the widespread

deployment of hybrid techniques in operational ASR systems.

Addressing the crucial issue of speaker adaptation (SA) for

hybrid HMM/ANN system can thereby have a great impact

on the connectionist paradigm, which will play a major role

in the design of next-generation LVCSR considering the great

success reported by deep neural networks – ANNs with many

hidden layers that adopts the pre-training technique – on many

speech tasks. Current adaptation techniques for ANNs based on

injecting an adaptable linear transformation network connected

to either the input, or the output layer are not effective especially

with a small amount of adaptation data, e.g., a single adaptation

utterance. In this paper, a novel solution is proposed to overcome

those limits and make it robust to scarce adaptation resources.

The key idea is to adapt the hidden activation functions rather

than the network weights. The adoption of Hermitian activation

functions makes this possible. Experimental results on an LVCSR

task demonstrate the effectiveness of the proposed approach.

Details

Publication typeArticle
Published inIEEE Transactions on Audio, Speech, and Language Processing
PublisherIEEE
> Publications > Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems