Hui Lin, Li Deng, Jasha Droppo, Dong Yu, and Alex Acero
One key issue in developing learning methods for multilingual acoustic modeling in large vocabulary automatic speech recognition (ASR) applications is to maximize the benefit of boosting the acoustic training data from multiple source languages while minimizing the negative effects of data impurity arising from language “mismatch”. In this paper, we introduce two learning methods, semiautomatic unit selection and global phonetic decision tree, to address this issue via effective utilization of acoustic data from multiple languages. The semi-automatic unit selection is aimed to combine the merits of both data-driven and knowledgedriven approaches to identifying the basic units in multilingual acoustic modeling. The global decision-tree method allows clustering of cross-center phones and cross-center states in the HMMs, offering the potential to discover a better sharing structure beneath the mixed acoustic dynamics and context mismatch caused by the use of multiple languages’ acoustic data. Our preliminary experiment results show that both of these learning methods improve the performance of multilingual speech recognition.
In NIPS Workshop, Whistler, BC, Canada
© 2008 Microsoft Corporation. All rights reserved.