Hello! and Welcome
This is the personal webpage of Zhi-Jie Yan (Chinese name: 鄢志杰). I'm a researcher with the speech group of Microsoft Research Asia. I joined MSR Aisa since July, 2008. Before that, I received my Ph.D degree in the Department of EEIS, University of Science and Technology of China. When I was a graduate student, I worked with iFlytek Speech Lab from 2003 to 2008. During that period, I visited MSR Asia as a speech group intern, from Jun., 2005 to Jan., 2006. I also visited the School of ECE, Georgia Tech, as a visiting scholar in 2007. I received the Microsoft Fellowship award in 2006, and the ICASSP student paper contest winner award in 2007.
E-mail: zhijiey@microsoft.com | Speech Group of MSR Asia | MSR Asia
Research Topics
My research interests include speech recognition, synthesis and processing. Currently I'm mainly working on automatic speech recognition at MSR Asia. My current research topic includes acoustic modeling for speech recognition and speaker classification, and also large-scale machine learning platform for speech applications.
-
Acoustic Modeling for Speech Recognition
We are doing research on both training criteria and optimization methods of acoustic modeling for speech recognition. Our research includes the Irrelevant Variability Normalization (IVN) based training, and i-vector based approach for speech data clustering. Related papers can be found in ICASSP 2011/2012 and InterSpeech 2011.
We are also working on discriminative training using GMM-HMM and DNN-HMM. Details will be available soon after the work is published.
-
Large-scale Machine Learning Platform Optimized for Speech
We have built a large-scale machine learning platform optimized for speech applications, especially acoustic model training. This platform is implemented in an HPC (High Performance Computing) cluster using MPI (Message Passing Interface). It handles the "big data" which is essential for building a state-of-the-art speech recognition service. The detail of this project can be found in our IWSML 2012 paper entitled "Designing an MPI-Based Parallel and Distributed Machine Learning Platform on Large-Scale HPC Clusters."
-
Rich Context Model-Based Speech Synthesis
We propose to directly use rich context models to model training speech in HMM-based TTS, and to generate testing speech in synthesis. Compared with conventional decision-tree tied models, rich context models are crisper in nature, and carry with richer segmental and supra-segmental information. So the over-smoothing problem in conventional approach is significantly alleviated, which enables the synthesis of high quality speech.
Rich context models can also be used to build an HMM-guided unit selection TTS system. Rich-context Unit Selection (RUS) has been transferred to Microsoft products to build high quality speech synthesis engines. Related papers can be found in InterSpeech 2009, ICASSP 2010 and InterSpeech 2010.
- Zhi-Jie Yan, Teng Gao, and Qiang Huo, Designing an MPI-Based Parallel and Distributed Machine Learning Platform on Large-Scale HPC Clusters, in International Workshop on Statistical Machine Learning for Speech Processing, IWSML 2012, IEEE, 31 March 2012
- Yu Zhang, Jian Xu, Zhi-Jie Yan, and Qiang Huo, A Study of Discriminative Feature Extraction for i-vector Based Acoustic Sniffing in IVN Acoustic Model Training, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2012, ICASSP 2012, International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 25 March 2012
- Yu Zhang, Zhi-Jie Yan, and Qiang Huo, A New i-vector Approach and Its Application to Irrelevant Variablity Normalization Based Acoustic Model Training, in 2011 IEEE International Workshop on Machine Learning for Signal Processing, IEEE, 18 September 2011
- Jian Xu, Yu Zhang, Zhi-Jie Yan, and Qiang Huo, An i-vector Based Approach to Acoustic Sniffing for Irrelevant Variability Normalization based Acoustic Model Training and Speech Recognition, in 12th Annual Conference of the International Speech Communication Association, InterSpeech 2011, International Speech Communication Association, 27 August 2011
- Yu Zhang, Jian Xu, Zhi-Jie Yan, and Qiang Huo, An i-vector Based Approach to Training Data Clustering for Improved Speech Recognition, in 12th Annual Conference of the International Speech Communication Association, InterSpeech 2011, International Speech Communication Association, 27 August 2011
- Yu Zhang, Jian Xu, Zhi-Jie Yan, and Qiang Huo, A Study of an Irrelevant Variability Normalization Based Discriminative Training Approach for LVCSR, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2011, ICASSP 2011, IEEE International Confrence on Acoustics, Speech, and Signal Processing (ICASSP), 22 March 2011
- Yi-Ning Chen, Zhi-Jie Yan, and Frank K. Soong, A Perceptual Study of Acceleration Parameters in HMM-Based TTS, in 11th Annual Conference of the International Speech Communication Association, InterSpeech 2010, International Speech Communication Association, 26 September 2010
- Yao Qian, Zhi-Jie Yan, Yi-Jian Wu, Frank K. Soong, Xin Zhuang, and Shengyi Kong, An HMM Trajectory Tiling (HTT) Approach to High Quality TTS, in 11th Annual Conference of the International Speech Communication Association, InterSpeech 2010, International Speech Communication Association, 26 September 2010
- Zhi-Jie Yan, Yao Qian, and Frank K. Soong, Rich-Context Unit Selection (RUS) Approach to High Quality TTS, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, ICASSP 2010, IEEE, 14 March 2010
- Zhi-Jie Yan, Yao Qian, and Frank K. Soong, Rich Context Modeling for High Quality HMM-Based TTS, in 10th Annual Conference of the International Speech Communication Association, InterSpeech 2009, International Speech Communication Association, 6 September 2009
- Zhi-Jie Yan, Cong Liu, Yu Hu, and Hui Jiang, A Trust Region Based Optimization for Maximum Mutual Information Estimation of HMMs in Speech Recognition, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, ICASSP 2009, IEEE, 19 April 2009
- Jinyu Li, Zhi-Jie Yan, Chin-Hui Lee, and Ren-Hua Wang, Soft Margin Estimation with Various Separation Levels for LVCSR, in 9th Annual Conference of the International Speech Communication Association, InterSpeech 2008, International Speech Communication Association, 22 September 2008
- Zhi-Jie Yan, Discriminative Training of Acoustic Models for Automatic Speech Recognition (声学模型区分性训练及其在自动语音识别中的应用), May 2008
- Zhi-Jie Yan, Bo Zhu, Yu Hu, and Ren-Hua Wang, Minimum Word Classification Error Training of HMMs for Automatic Speech Recognition, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, ICASSP 2008, IEEE, 31 March 2008
- Jinyu Li, Zhi-Jie Yan, Chin-Hui Lee, and Ren-Hua Wang, A Study on Soft Margin Estimation for LVCSR, in IEEE Workshop on Automatic Speech Recognition & Understanding, 2007, ASRU 2007, IEEE, 9 December 2007
- Zhi-Jie Yan, Frank K. Soong, and Ren-Hua Wang, Word Graph Based Feature Enhancement for Noisy Speech Recognition, in IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, ICASSP 2007, IEEE, 15 April 2007
