Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Yao Qian

Yao Qian

Microsoft Research Asia


Dr. Qian is now a lead researcher in the Speech Group, Microsoft Research Asia. She received the Ph.D degree in the Dept. of EE, The Chinese University of Hong Kong, in 2005. During her Ph.D study, she received the award of Microsoft Research Asia Fellowship in 2003. She joined Microsoft research Asia in September, 2005. She is interested in spoken language processing. Her recent research projects include speech synthesis, voice transformation, prosody modeling for speech synthesis, recognition and understanding and Computer-assisted language learning (CALL). Her most recent work focuses on deep learning and its application in speech synthesis and pronunciation evaluation. 

Current Projects:

Deep Neural Networks for Speech Generation and Synthesis

Deep Learning for Pronunciation Training and Evaluation

A Fast Statistical Model Driven Text-To-Speech Synthesis

Cross-lingual Voice Transformation

High Quality Text-To-Speech Synthesis



[1] Qian Yao, Soong Frank and Yan Zhi-jie, "A Unified Trajectory Tiling Approach to High Qaulity Speech Rendering",  IEEE Transactions on Audio, Speech and Language Processing, Vol 21, Issue 2, pp.280-290, 2013.

[2] Wang Lijuan, Qian Yao, Scott Matthew, Chen Gang and Soong Frank, "Computerized Audio-Visual Language Learning", computer, Vol 45, Issue 6, pp.38-47, 2012.

[3] Qian Yao, Wu Zhi-Zheng, Gao Bo-Yang and  Soong Frank K., “Improved Prosody Generation by Maximizing Joint Probability of State and Longer Units”, IEEE Transactions on Audio, Speech and Language Processing, Vol 19, Issue 6, pp. 1702-1710, 2011.

[4] Qian Yao and Soong Frank K., "A Multi-space Distribution (MSD) and Two-stream Tone Modeling Approach to Mandarin Speech Cognition", Speech Communication, Volume 51, Issue 12, pp. 1169-1179, 2009.

[5] Qian Yao, Liang Hui, Soong Frank K., "A Cross-Language State Sharing and Mapping Approach to Bilingual (Mandarin–English) TTS", IEEE Transactions on Audio, Speech, and Language Processing, VOL. 17, NO. 6, pp.1231-1239, 2009.

[6] Qian Yao, Soong Frank K. and Lee Tan, "Tone-enhanced Generalized Character Posterior Probability (GCPP) for Cantonese LVCSR", Computer Speech and Language , Vol.22, Issue 4, pp.360-373, 2008.

[7] Qian Yao, Lee Tan and Soong Frank K., "Tone recognition in continuous Cantonese speech using supratone models", the Journal of Acoustical Society of America , Vol.121, No.5, pp.2936-2945, 2007.

[8] Li Yujia, Lee Tan and Qian Yao, "Analysis and Modeling of F0 Contours for Cantonese Text-to-Speech", the Journal of ACM Transactions on Asian Language Information Processing, Vol. 3, Issue 3, pp. 169-180, 2004.

[9] Chu Min and Qian Yao, "Locating Boundaries for Prosodic Constituents in Unrestricted Mandarin", International journal of computational linguistics & Chinese language processing , Vol. 6, No.1, P51-82, February, 2001.

Recent Conference Papers:

[1]Qian Yao, Fan Yuchen, Hu Wenping and Soong Frank. K, , "On the Training Aspects of Deep Neural Network (DNN) for Parametric TTS Synthesis", in Proc.ICASSP, 2014.

[2] Hu Wenping, Qian Yao and Soong Frank. K, "A DNN-based Acoustic Modeling of Tonal Language and Its Application to Mandarin Pronunciation Training", in Proc.ICASSP, 2014.

[3] Fan Yuchen, Qian Yao, Xie Fenglong and Soong Frank. K, "TTS Synthesis with Bidirectional LSTM based Recurrent Neural Networks", in Proc.Interspeech, 2014.

[4] Xie Fenglong, Qian Yao, Fan Yuchen, Soong Frank. K and Li Haifeng , "Sequence Error (SE) Minimization Training of Neural Network for Voice Conversion", in Proc.Interspeech, 2014.

[5]Xie Feng-long, Qian Yao, Soong Frank K. and Li Haifeng, “Pitch Transformation in Network based Voice Conversion“,in Proc.ISCSLP, 2014.

[6] Hu Wenping, Qian Yao and Soong Frank. K, “A New Neural Network Based Logistic Regression Classifier For Improving Mispronunciation Detection of L2 Language Learners“,in Proc.ISCSLP, 2014.

[7] Hu Wenping, Qian Yao and Soong Frank. K, “A New DNN-based High Quality Pronunciation Evaluation for Computer-Aided Language Learning (CALL)”, in Proc.Interspeech, 2013.

[8] Qian Yao, Soong Frank. K, Zhou Xiaobo, Qian Yundi and Zhang Xiaotian, “A Fast Table Lookup based, Statistical Model Driven Non-Uniform UNIT Selection TTS”, in Proc. ICASSP, 2013.

[9] Ji He, Yao Qian, Frank K. Soong and Sheng Zhao, “Turning a Monolingual Speaker into Multilingual for a Mixed-language TTS”, in Proc. Interspeech 2012.

[10] Yao Qian and Frank K. Soong, “A Unified Trajectory Tilng Approach to High Quality TTS and Cross-lingual Voice Transformation” , in Proc. ISCSLP, 2012.

[11] Wenping Hu, Yao Qian and Frank K. Soong, ”Pitch Accent Detection and Prediction with DCT and CRF Model”, in Proc. ISCSLP, 2012.

[12] Xiaotian Zhang, Yao Qian, Hai Zhao and Frank K. Soong, “Break Index Labeling of Mandarin Text via Syntactic-to-Prosodic Tree Mapping”, in Proc. ISCSLP, 2012.

[13] Bo Peng, Yao Qian, Frank K. Soong and Bo Zhang, “A New Phonetic Candidate Generator for Improving Search Query Efficiency”, in Proc. Interspeech 2011.

[14] Aki Kunikoshi, Yao Qian, Frank K. Soong and Nobuaki Minematsu, “Improved F0 modeling and generation in voice conversion,” In Proc. ICASSP 2011.

[15] Yao Qian, Ji Xu and Frank K. Soong, “A frame mapping based HMM approach to cross-lingual voice transformation,” In Proc. ICASSP 2011.

[16] Qian Yao, Wu Zhizheng, Ma Xuezhe and Soong Frank, “Automatic Prosody Prediction and Detection with Conditional Random Field (CRF) model”, In Proc. ISCSLP 2010.

[17] Xin Zhuang, Yao Qian, Frank K. Soong, Yijian Wu , Bo Zhang, “Formant-based Frequency Warping for Improving Speaker Adaptation in HMM TTS”, In Proc. InterSpeech 2010.

[18] Yao Qian, Zhi-jie Yan, Yijian Wu , Frank K. Soong , Xin Zhuang, Shengyi Kong, “An HMM Trajectory Tiling (HTT) Approach to High Quality TTS”, In Proc. InterSpeech 2010.


[20] Qing-Qing Zhang, Frank K. Soong, Yao Qian, Zhijie Yan, Jielin Pan, Yonghong Yan, “IMPROVED MODELING FOR PITCH GENERATION AND V/U DECISION IN HMM-BASED TTS”, In Proc. ICASSP 2010.

 More Conference Papers  


[1] Yao Qian and Frank K. Soong, Frame Mapping Approach for Cross-lingual Voice Transformation, Patent ID: US 8594993, Issue Date: November 26, 2013.(issued)

[2] Zhijie Yan, Yao Qian and Frank K. Soong, Rich Context Modeling for Text-to-Speech Engines, Patent ID: US8340965, Issue Date: Dec 25, 2012.(issued)

[3] Yao Qian and Frank K. Soong, HMM-based Bilingual (Mandarin-English) TTS Techniques, Patent ID: US8244534, Issue Date: Aug 14, 2012.(issued)

[4] Yao Qian and Frank K. Soong, Synthesized Singing Voice Waveform Generator, Patent ID: US 7977562, Issue Date: July 12, 2011.(issued)

[5] Chu Min and Qian Yao, Method and apparatus for identifying prosodic word boundaries, Patent ID: US7263488, Issue Date: August 28, 2007.(issued)

[6] Yao Qian and Frank K. Soong, Multi-Space Distribution for Pattern Recognition based on Mixed Continuous and Discrete Observations, Patent application number: US-2008-0120108-A1, Publication date: 5/22/2008.

[7] Yao Qian and Frank K. Soong, Line Spectrum Pair Density Modeling for Speech Applications, Patent application number: US-2008-0195381-A1, Publication date: 8/14/2008.

[8] Yao Qian and Frank K. Soong, Stylized Prosody for Speech Synthesis-based application, Patent application number: US-2010-0066742-A1, Publication date: 3/18/2010.

[9] YiNing Chen, Yao Qian and Frank K. Soong, State Mapping for Cross-language Speaker Adaptation, Patent application number: US-2010-0198577-A1, Publication date: 8/5/2010.

[10] Yao Qian, Frank K. Soong, Zhijie Yan and Yi-jian Wu, Trajectory Tiling Approach for Text-to-Speech, Patent application number: US-2012-0143611-A1, Publication date: 6/7/2012.

[11] Bin Zhu, Yao Qian and Frank K. Soong, Audio Human Interactive Proof Based on Text-to-Speech and Semantics, Patent application number: US-2013-0218566-A1, Publication date: 8/22/2013.