Low-rank Plus Diagonal Adaption for Deep Neural Networks

ICASSP |

In this paper, we propose a scalable adaptation technique that adapts the deep neural network (DNN) model through the low-rank plus diagonal (LRPD) decomposition. It is desired that an adaptation method can properly accommodate the available development data with a variable amount of adaptation parameters. Thus, the resulting models neither over-fit nor under-fit as the development data vary in size for different speakers. The technique developed in this paper is inspired by observing that adaptation matrices are very close to an identity matrix or diagonally dominant. The LRPD restructures the adaptation matrix as a superposition of a diagonal matrix and a low-rank matrix. By varying the low-rank values, the LRPD contains the full and the diagonal adaptation matrix as its special cases. Experimental results demonstrated that the LRPD adaptation of the full-size DNN obtains improved accuracy over the standard linear adaptation. The LRPD bottleneck adaptation can reduce the speakerspecific footprint by 82% over an already very compact SVD bottleneck adaptation, at an expense of 1% relative WER increase.