Sabato Marco Siniscalchi, Jinyu Li, and Chin-Hui Lee
This work is concerned with speaker adaptation techniques for artificial neural network (ANN) implemented as feed-forward multi-layer perceptrons (MLPs) in the context of large vocabulary continuous speech recognition (LVCSR). Most successful speaker adaptation techniques for MLPs consist of augmenting the neural architecture with a linear transformation network connected to either the input or the output layer. The weights of this additional linear layer are learned during the adaptation phase while all of the other weights are kept frozen in order to avoid over-fitting. In doing so, the structure of the speakerdependent (SD) and speaker-independent (SI) architecture differs and the number of adaptation parameters depends upon the dimension of either the input or output layers. We propose an alternative neural architecture for speaker-adaptation to overcome the limits of current approaches. This neural architecture adopts hidden activation functions that can be learned directly from the adaptation data. This adaptive capability of the hidden activation function is achieved through the use of orthonormal Hermite polynomials. Experimental evidence gathered on the Wall Street Journal Nov92 task demonstrates the viability of the proposed technique.