Wei-Yun Ma, Yun-Cheng Ju, Xiaodong He, and Li Deng
Language model (LM) adaptation is an active area in natural language processing and has been successfully applied to speech recognition and to many other applications. To provide fine-grained probability adaptation for each n-grams, we in this work propose three adaptation methods based on shared linear transformations: n-gram-based linear regression, interpolation, and direct estimation. Further, in order to address the problem of data sparseness, n-grams are clustered and those in the same cluster group are made to share the same adaptation parameters. We carry out evaluation experiments on a domain adaptation task with limited adaptation data. The experimental results show that the best LM after our adaption method can reduce the perplexity by half compared with the baseline LM without adaptation, and that it also achieves a perplexity reduction of 15% compared with the earlier state-of-the-art LM adaptation methods. The speech recognition experimental results show that the proposed LM adaptation method reduces the WER by 20.8% compared with the baseline LM without adaptation.
© Microsoft Research