Yao Qian

Yao Qian

Microsoft Research Asia

Email: yaoqian@microsoft.com

Dr. Qian is now a lead researcher in the Speech Group, Microsoft Research Asia. She received the Ph.D degree in the Dept. of EE, The Chinese University of Hong Kong, in 2005. During her Ph.D study, she received the award of Microsoft Research Asia Fellowship in 2003. She joined Microsoft research Asia in September, 2005. She is interested in spoken language processing. Her recent research projects include speech synthesis, voice transformation, prosody modeling for speech synthesis, recognition and understanding and Computer-assisted language learning (CALL). Her most recent work focuses on deep learning and its application in speech synthesis and pronunciation evaluation. 

Current Projects:

Deep Neural Networks for Speech Generation and Synthesis

Deep Learning for Pronunciation Training and Evaluation

A Fast Statistical Model Driven Text-To-Speech Synthesis

Cross-lingual Voice Transformation

High Quality Text-To-Speech Synthesis



[1] Qian Yao, Soong Frank and Yan Zhi-jie, "A Unified Trajectory Tiling Approach to High Qaulity Speech Rendering",  IEEE Transactions on Audio, Speech and Language Processing, Vol 21, Issue 2, pp.280-290, 2013.

[2] Wang Lijuan, Qian Yao, Scott Matthew, Chen Gang and Soong Frank, "Computerized Audio-Visual Language Learning", computer, Vol 45, Issue 6, pp.38-47, 2012.

[3] Qian Yao, Wu Zhi-Zheng, Gao Bo-Yang and  Soong Frank K., “Improved Prosody Generation by Maximizing Joint Probability of State and Longer Units”, IEEE Transactions on Audio, Speech and Language Processing, Vol 19, Issue 6, pp. 1702-1710, 2011.

[4] Qian Yao and Soong Frank K., "A Multi-space Distribution (MSD) and Two-stream Tone Modeling Approach to Mandarin Speech Cognition", Speech Communication, Volume 51, Issue 12, pp. 1169-1179, 2009.

[5] Qian Yao, Liang Hui, Soong Frank K., "A Cross-Language State Sharing and Mapping Approach to Bilingual (Mandarin–English) TTS", IEEE Transactions on Audio, Speech, and Language Processing, VOL. 17, NO. 6, pp.1231-1239, 2009.

[6] Qian Yao, Soong Frank K. and Lee Tan, "Tone-enhanced Generalized Character Posterior Probability (GCPP) for Cantonese LVCSR", Computer Speech and Language , Vol.22, Issue 4, pp.360-373, 2008.

[7] Qian Yao, Lee Tan and Soong Frank K., "Tone recognition in continuous Cantonese speech using supratone models", the Journal of Acoustical Society of America , Vol.121, No.5, pp.2936-2945, 2007.

[8] Li Yujia, Lee Tan and Qian Yao, "Analysis and Modeling of F0 Contours for Cantonese Text-to-Speech", the Journal of ACM Transactions on Asian Language Information Processing, Vol. 3, Issue 3, pp. 169-180, 2004.

[9] Chu Min and Qian Yao, "Locating Boundaries for Prosodic Constituents in Unrestricted Mandarin", International journal of computational linguistics & Chinese language processing , Vol. 6, No.1, P51-82, February, 2001.

Conference Papers



[1] Yao Qian and Frank K. Soong, Frame Mapping Approach for Cross-lingual Voice Transformation, Patent ID: US 8594993, Issue Date: November 26, 2013.(issued)

[2] Zhijie Yan, Yao Qian and Frank K. Soong, Rich Context Modeling for Text-to-Speech Engines, Patent ID: US8340965, Issue Date: Dec 25, 2012.(issued)

[3] Yao Qian and Frank K. Soong, HMM-based Bilingual (Mandarin-English) TTS Techniques, Patent ID: US8244534, Issue Date: Aug 14, 2012.(issued)

[4] Yao Qian and Frank K. Soong, Synthesized Singing Voice Waveform Generator, Patent ID: US 7977562, Issue Date: July 12, 2011.(issued)

[5] Chu Min and Qian Yao, Method and apparatus for identifying prosodic word boundaries, Patent ID: US7263488, Issue Date: August 28, 2007.(issued)

[6] Yao Qian and Frank K. Soong, Multi-Space Distribution for Pattern Recognition based on Mixed Continuous and Discrete Observations, Patent application number: US-2008-0120108-A1, Publication date: 5/22/2008.

[7] Yao Qian and Frank K. Soong, Line Spectrum Pair Density Modeling for Speech Applications, Patent application number: US-2008-0195381-A1, Publication date: 8/14/2008.

[8] Yao Qian and Frank K. Soong, Stylized Prosody for Speech Synthesis-based application, Patent application number: US-2010-0066742-A1, Publication date: 3/18/2010.

[9] YiNing Chen, Yao Qian and Frank K. Soong, State Mapping for Cross-language Speaker Adaptation, Patent application number: US-2010-0198577-A1, Publication date: 8/5/2010.

[10] Yao Qian, Frank K. Soong, Zhijie Yan and Yi-jian Wu, Trajectory Tiling Approach for Text-to-Speech, Patent application number: US-2012-0143611-A1, Publication date: 6/7/2012.

[11] Bin Zhu, Yao Qian and Frank K. Soong, Audio Human Interactive Proof Based on Text-to-Speech and Semantics, Patent application number: US-2013-0218566-A1, Publication date: 8/22/2013.