Zhijie Yan

Zhijie Yan
RESEARCHER
.

Hello! and Welcome

This is the personal webpage of Zhi-Jie Yan (Chinese name: 鄢志杰). I'm a researcher with the speech group of Microsoft Research Asia. I joined MSR Aisa since July, 2008. Before that, I received my Ph.D degree in the Department of EEIS, University of Science and Technology of China. When I was a graduate student, I worked with iFlytek Speech Lab from 2003 to 2008. During that period, I visited MSR Asia as a speech group intern, from Jun., 2005 to Jan., 2006. I also visited the School of ECE, Georgia Tech, as a visiting scholar in 2007. I received the Microsoft Fellowship award in 2006, and the ICASSP student paper contest winner award in 2007.

E-mail: zhijiey@microsoft.com | Speech Group of MSR Asia | MSR Asia

Research Topics

My research interests include speech recognition, machine learning, speech synthesis and processing. Currently I'm mainly working on research in automatic speech recognition and machine learning. My current research topics includ acoustic modeling for speech recognition, training criteria and optimization methods for training deep neural networks, and also large-scale machine learning platform for speech applications.

  • Acoustic Modeling for Speech Recognition

We are working on discriminative training using both GMM-HMM and DNN-HMM. In GMM-HMM framework, a tied-state based training criterion is used to train context-expanded region dependent linear transforms (CE-RDLTs), which achieves improved recognition performance comparing with state-of-the-art discriminative training methods. After combining this method with features derived from a deep neural network (DNN), a scalable approach to using DNN-GMM-HMM acoustic models is proposed for speech recognition and adaptation. Related papers can be found in ICASSP 2013 and InterSpeech 2013 (see selected publications below).

We did research on both training criteria and optimization methods of acoustic modeling for speech recognition. Our research includes the Irrelevant Variability Normalization (IVN) based training, and i-vector based approach for speech data clustering. Related papers can be found in ICASSP 2011/2012 and InterSpeech 2011 (see selected publications below).

  • Deep Learning and Deep Neural Network

We are working on both training criteria and optimization methods for DNN training. Details will be available soon after the work is published.

  • Large-scale Machine Learning Platform Optimized for Speech

We have built a large-scale machine learning platform optimized for speech applications, especially acoustic model training. This platform is implemented in an HPC (High Performance Computing) cluster using MPI (Message Passing Interface). It handles the "big data" which is essential for building a state-of-the-art speech recognition service. The detail of this project can be found in our IWSML 2012 paper entitled "Designing an MPI-Based Parallel and Distributed Machine Learning Platform on Large-Scale HPC Clusters."

  • Rich Context Model-Based Speech Synthesis

We propose to directly use rich context models to model training speech in HMM-based TTS, and to generate testing speech in synthesis. Compared with conventional decision-tree tied models, rich context models are crisper in nature, and carry with richer segmental and supra-segmental information. So the over-smoothing problem in conventional approach is significantly alleviated, which enables the synthesis of high quality speech.

Rich context models can also be used to build an HMM-guided unit selection TTS system. Rich-context Unit Selection (RUS) has been transferred to Microsoft products to build high quality speech synthesis engines. Related papers can be found in InterSpeech 2009, ICASSP 2010 and InterSpeech 2010.

Selected Publications