A Study on Multilingual Acoustic Modeling For Large Vocabulary ASR

Hui Lin, Li Deng, Dong Yu, Yifan Gong, Alex Acero, and Chi-Hui Lee

Abstract

We study key issues related to multilingual acoustic modeling for automatic speech recognition (ASR) through a series of large-scale ASR experiments. Our study explores shared structures embedded in a large collection of speech data spanning over a number of spoken languages in order to establish a common set of universal phone models that can be used for large vocabulary ASR of all the languages seen or unseen during training. Language-universal and language-adaptive models are compared with language-specific models, and the comparison results show that in many cases it is possible to build general-purpose language-universal and language-adaptive acoustic models that outperform language-specific ones if the set of shared units, the structure of shared states, and the shared acoustic-phonetic properties among different languages can be properly utilized. Specifically, our results demonstrate that when the context coverage is poor in language-specific training, we can use one tenth of the adaptation data to achieve equivalent performance in cross-lingual speech recognition.

Index Terms— Multilingualism, acoustic modeling, language adaptation, universal phone models.

Details

Publication typeInproceedings
Published inProceedings of the ICASSP
PublisherInstitute of Electrical and Electronics Engineers, Inc.
> Publications > A Study on Multilingual Acoustic Modeling For Large Vocabulary ASR