Zhijie Yan

Zhijie Yan
RESEARCHER
.

Hello! and Welcome

This is the personal webpage of Zhi-Jie Yan (Chinese name: 鄢志杰). I'm a researcher with the speech group of Microsoft Research Asia. I joined MSR Aisa since July, 2008. Before that, I received my Ph.D degree in the Department of EEIS, University of Science and Technology of China. When I was a graduate student, I worked with iFlytek Speech Lab from 2003 to 2008. During that period, I visited MSR Asia as a speech group intern, from Jun., 2005 to Jan., 2006. I also visited the School of ECE, Georgia Tech, as a visiting scholar in 2007. I received the Microsoft Fellowship award in 2006, and the ICASSP student paper contest winner award in 2007.

E-mail: zhijiey@microsoft.com | Speech Group of MSR Asia | MSR Asia

Research Topics

My research interests include speech recognition, synthesis and processing. Currently I'm mainly working on automatic speech recognition at MSR Asia. My current research topic includes acoustic modeling for speech recognition and speaker classification, and also large-scale machine learning platform for speech applications.

  • Acoustic Modeling for Speech Recognition

We are doing research on both training criteria and optimization methods of acoustic modeling for speech recognition. Our research includes the Irrelevant Variability Normalization (IVN) based training, and i-vector based approach for speech data clustering. Related papers can be found in ICASSP 2011/2012 and InterSpeech 2011.

We are also working on discriminative training using GMM-HMM and DNN-HMM. Details will be available soon after the work is published.

  • Large-scale Machine Learning Platform Optimized for Speech

We have built a large-scale machine learning platform optimized for speech applications, especially acoustic model training. This platform is implemented in an HPC (High Performance Computing) cluster using MPI (Message Passing Interface). It handles the "big data" which is essential for building a state-of-the-art speech recognition service. The detail of this project can be found in our IWSML 2012 paper entitled "Designing an MPI-Based Parallel and Distributed Machine Learning Platform on Large-Scale HPC Clusters."

  • Rich Context Model-Based Speech Synthesis

We propose to directly use rich context models to model training speech in HMM-based TTS, and to generate testing speech in synthesis. Compared with conventional decision-tree tied models, rich context models are crisper in nature, and carry with richer segmental and supra-segmental information. So the over-smoothing problem in conventional approach is significantly alleviated, which enables the synthesis of high quality speech.

Rich context models can also be used to build an HMM-guided unit selection TTS system. Rich-context Unit Selection (RUS) has been transferred to Microsoft products to build high quality speech synthesis engines. Related papers can be found in InterSpeech 2009, ICASSP 2010 and InterSpeech 2010.

Selected Publications