
Min Chu
Researcher/Project Leader
Microsoft Research Asia
Min leads the text-to-speech (TTS) efforts in the
speech group at
Microsoft Research Asia since she joined in March 2000. They constructed a top
Mandarin TTS system during the first year and extended it to English recently.
Their latest system Microsoft Mulan, a
bilingual TTS system, can switch between Mandarin and English smoothly in the
same engine.
Research Interests:
Speech synthesis, analysis and perception: concatenative/LPC/formant
synthesis, prosody modeling, signal processing, stress in Mandarin, formant
tracking, emotions in speech, speech perception.
Natural language processing: word/prosodic word segmentation, phrase/prosodic
phrase parsing, letter/character-to-sound conversion, name entity recognition.
Background:
Before joining Microsoft, Min worked with Intel China Research Center for a
year, leading the TTS group there. Min received her Ph.D. from the Institute of
Acoustics, Chinese Academy of Sciences in 1995. During the Ph.D. program, she
won the presidential graduate student fellowship from Chinese Academy of
Sciences in 1994 for remarkable research contribution. (The Mandarin
text-to-speech system she developed then rated the best in the competition
sponsored by the national 863 program of China.) After that, Min stayed in the
same institute as an associate research professor till 1999. During that period,
she was the technical leader of several research projects sponsored by the
national 863 program of China. In 1997, Min visited Chinese University of Hong
Kong for one year where she worked on Cantonese TTS and a hybrid speech
synthesis scheme of sinusoidal model and TD-PSOLA.
Min received her B.S degree from Northwestern Polytechnological Univ. in
1990, and M.S. degree from Harbin Ship building Engineering Institute in 1992,
respectively.
Publications:
In English:
[1]
Min
Chu and Mingzhen Bao, Comparison of Sentential-Stress Allocation within
Base Phrase among Different Reading Styles, Proc. of International
Conference on Speech Prosody, Nara, 2004, pp. 111-114
[2]
LiJuan Wang, Yong Zhao, Min Chu,
Jianlai Zhou and Zhigang Cao, Refining Segmental
Boundaries for TTS database Using Fine Contextual-Dependent Boundary Models,
proc. of ICASSP 2004, Montreal, pp. I-641~I-644.
[3]
Chao
Huang, Yu
Shi, Jianlai Zhou, Min Chu, Terry Wang and Eric Chang,
Segmental Tonal Modeling for Phone Set Design in
Mandarin LVCSR, proc. of ICASSP 2004, Montreal, pp. I-901~I-904.
[4]
Ye
Tian, Jianlai Zhou, Min Chu and Eric Chang,
Tone Recognition with Fractionized Models and
Outlined Features, proc. of ICASSP 2004, Montreal, pp. I-105~I-108.
[5]
Min
Chu, Yunjia
Wang and Lin He, Labeling Stress in Continuous
Mandarin Speech Perceptually, proc. of the 15th
International Congress of Phonetic Sciences, Barcelona, 2003.
[6]
Yunjia Wang, Min Chu and
Lin He, Location of Sentence Stresses within
Disyllabic Words in Mandarin, proc. of the 15th International
Congress of Phonetic Sciences, Barcelona,
2003.
[7]
Yong
Zhao, Min
Chu, Hu Peng and Eric Chang,
Custom-Tailoring TTS Voice Font – Keeping the
Naturalness When Reducing Database Size, proc. of the 8th
European Conference on Speech Communication and Technology (Eurospeech 2003), Geneva, 2003.
[8]
Yining
Chen, Min
Chu, Eric Chang,
Jia Liu and Runsheng Liu, Voice Conversion with
Smoothed GMM and MAP Adaptation, proc. of the 8th European
Conference on Speech Communication and Technology (Eurospeech 2003), Geneva, 2003.
[9]
Min
Chu, Hu Peng, Yong
Zhao, zhengyu Niu and Eric Chang,
Microsoft Mulan – a Bilingual TTS System,
Proc. of ICASSP 2003, Hong Kong, 2003.
[10]
Yunjia Wang, Min Chu, Lin He and Yongqiang Feng, Stress Perception of
Chinese Disyllabic Words in Utterance, Chinese Journal of Acoustics,
Vol.22, No.1, 2003.
[11]
Min Chu,
Chun Li, Hu Peng and Eric Chang, Domain Adaptation for
TTS systems, Proc. of ICASSP 2002, Orlando, 2002.
[12]
Hu Peng, Yong Zhao and Min
Chu,
Perpetually optimizing the cost function for unit selection in a
TTS system with one single run of MOS evaluation, Proc. of ICSLP2002,
Denver, 2002.
[13]
Yu Shi, Eric Chang, Hu Peng and Min
Chu,
Power Spectral Density Based Channel Equalization of Large
Speech Database for Concatenative TTS system, Proc. of ICSLP2002, Denver, 2002.
[14]
Zi-Rong Zhang, Min Chu and Eric Chang,
An
Efficient Way to Learn Rules for Grapheme-to-Phoneme Conversion in Chinese,
ISCSLP 2002, Taipei.
[15]
Min Chu and
Yao Qian,
Locating Boundaries for
Prosodic Constituents in Unrestricted Mandarin Texts, Journal of Computational Linguistics and Chinese Language
Processing, Vol.6. No.1. Feb. 2001, pp. 61-82.
[16]
Min Chu and
Hu Peng,
An objective measure for estimating MOS of synthesized speech,
Proc. of Eurospeech2001, Aalborg, 2001, pp.2087-2090. (won the COCOSDA best paper award)
[17]
Min Chu and
Yong-Qiang Feng,
Study on
Factors Influencing Durations of Syllables in Mandarin, Proc. of Eurospeech2001, Aalborg,
2001, pp.927-930.
[18]
Min Chu, Hu
Peng and Eric Chang,
A
concatenative Mandarin TTS system without prosody model and prosody
modification, Proceedings
of 4th ISCA workshop on speech synthesis, Scotland, 2001.
[19]
Min Chu, Hu
Peng, Hong-Yun Yang and Eric Chang,
Selecting non-uniform units from a very
large corpus for concatenative speech synthesizer, Proc. of ICASSP2001, Salt Lake City, 2001.
[20]
Yao Qian, Min
Chu and Hu Peng,
Segmenting
Unrestricted Chinese Text into Prosodic Words Instead of Lexical Words,
Proc. of ICASSP2001, Salt Lake City, 2001.
[21]
Jieping Xu, Min Chu, Lin
He and Shinan Lv, The Influence of Chinese Sentence Stress in Pitch and
Duration, Chinese Journal of Acoustics, Vol. 19, No. 3, 2000, pp.270-277.
[22]
Min Chu,
Difei Tang, Hongyan Si, Xuqing Tian and Shinan Lu, Research on Perception of
Juncture Between Syllables in Chinese, Chinese Journal of Acoustics, Vol.17,
No.2, 1998, pp. 143-152.
[23]
Min Chu and
P. C. Ching,
A Hybrid Approach to Synthesize
High Quality Cantonese Speech, Proc. of ICASSP98, Seattle, 1998.
[24]
Dinghua Guan, Min Chu,
Quan Zhang, Jian Liu and Xiangdong Zhang,
The
Research Project of Man-Computer Dialogue System in Chinese, Proc. of ICSLP98, Sydney, 1998.
[25]
Min Chu, Lin He, Jieping Xu and Shinan
Lu, Voice Conversion Between Female and Male in a TD-PSOLA Based Chinese TTS
system, Proc. of ISCSLP98, Singapore, 1998.
[26]
LU Shinan, CHU Min and SI Hongyan,
Study on Chinese Text-to-Speech System, Proc.
of ISSPR'98, Hong Kong, 1998.
[27]
Min Chu and
Shinan Lu, Building up a Cantonese Prosody Model by Using Neural Network, Proc.
of Conference on Phonetics of the Languages in China, Hong Kong, 1998.
[28]
Hongyan Si, Min Chu, and
Shinan Lu, The perceptual properties of stable portion of consonants in
Chinese, Proc. of the Conference on Phonetics of the Languages in China, Hong
Kong, 1998.
[29]
Min Chu,
Hongyan Si, Xuqing Tian, Shinan Lu and P.C. Ching, Research on Perception of
Formant Transition Between Syllables in Chinese, Proc. of The Sixth Western
Pacific Regional Acoustics Conference, Hong Kong,
1997, pp. 94-99.
[30]
Hongyan SI, Min CHU and Shinan LU, A novel waveform concatention
algorithm for Chinese PSOLA-based synthesizer, The Proceeding of The Sixth West/Pacific Reginal Acoustics Congress,
Hong Kong, 1997.
[31]
Min Chu,
Difei Tang, Shinan Lu and Dinghua Guan, The prosody model for the Chinese TTS
system Lengend Voice, Proc. of the First China-Japan Workshop on Spoken
Language Processing, 1997, pp. 111-116
[32]
Hongyan Si, Min Chu and
ShiNan Lu, A novel waveform concatenation algorithm for Chinese PSOLA-based
synthesizer, Proc. of the First China-Japan Workshop on Spoken Language
Processing, 1997, pp. 135-140
[33]
Difei Tang, Min Chu,
Shinan Lu and Lin He, Word segmentation for Chinese TTS system Lengend Voice,
Proc. of the First China-Japan Workshop on Spoken Language Processing, 1997,
pp. 153-156
[34]
Min Chu and Shinan Lu, A Text-to-Speech System with High Intelligibility
and High Naturalness for Chinese, Chinese Journal of Acoustics, Vol.15 No.1 ,
1996, pp. 81-90.
[35]
Shinan Lu, Min Chu, Lin He, Yamin Lu, Xiaoguang Li and Jie Ma,
The Design and Realization of a Spoken Chinese Output System, Proc. of ICMI'96.
[36]
Min
Chu, Shinan Lu, Hongyan Si, Lin He
and Dinghua Guan, The control of Juncture and prosody in Chinese TTS system, Proceeding of ICSP'96, Beijing,
China,1996.
[37]
Min Chu and
Shinan Lu, High Intelligibility and Naturalness Chinese TTS System and Prosodic
Rules, Proc. of XIII International Congress of Phonetic, Stockolm, 1995,
P.2:334-2:337.
[38]
Dinghua Guan, Min Chu and
Shinan Lu, A Chinese Text-to-speech System with High Intelligibility and High
Naturalness, Proc. of International Conference on Acoustics, Trodheim Norway,
pp.31-34.
In Chinese:
[39]
初敏,自然言语的韵律组织中的不确定性及其在语音合成中的应用, 中文信息学报,Vol. 18, No. 4, 2004, pp.66-71.
[40]
初敏、王韫佳和包明真,普通话节律组织中的局部语法约束和长度约束,语言学论丛,第三十辑,2004,即将出版。
[41]
初敏,自然言语的韵律组织中的不确定性及其在语音合成中的应用,第七届全国语音通讯信号处理学术论文集,2003,厦门。
[42]
初敏、王韫佳和包明真,普通话节律组织中的局部语法约束和长度约束,第六届全国现代语音学学术会议论文集,2003,天津。
[43]
王韫佳,初敏和贺琳,语义重音分布的初步研究,第六届全国现代语音学学术会议论文集,2003,天津。
[44]
王韫佳,初敏和贺琳,汉语语句重音的分类和分布,心理学报,Vol. 35, No. 6, 2003, pp. 734-742。
[45]
王韫佳,初敏,贺琳和冯勇强,连续话语中双音节韵律词的重音感知,声学学报,Vol.28, No.6, 2003, pp.534-539.
[46]
张子荣和初敏,解决多音字字—音转换的一种统计学习方法,中文信息学报,Vol.16, No.3, 2002, pp.
39-45.
[47]
初敏,韵律研究与合成语音的自然度,第五届全国现代语音学学术会议论文集,pp.295-301,2001,北京。
[48]
冯勇强,初敏,贺琳,吕士楠,汉语话语音节时长统计分析, 第五届全国现代语音学学术会议论文集,pp.66-69,2001,北京。
[49]
钱瑶,初敏,潘悟云,普通话韵律单元边界的声学分析, 第五届全国现代语音学学术会议论文集,pp.70-74,2001,北京。
[50]
王韫佳,初敏,贺琳和冯勇强,语句中双音节韵律词重音感知的初步研究, 第五届全国现代语音学学术会议论文集,pp.166-170,2001,北京。
[51]
贺琳,初敏,吕士楠,钱瑶和冯勇强,汉语合成语料库的韵律层级标注研究, 第五届全国现代语音学学术会议论文集,pp.323-326,2001,北京。
[52]
许洁萍,初敏,贺林和吕士楠,汉语语句重音对音高和音长的影响,声学学报,Vol. 25, N0. 4, 2000,pp.335-339。
[53]
初敏,吕士楠,一种将PSOLA算法与语音正弦模型结合的合成方法,第五届人机语音通讯学术会议论文集,1998, 哈尔滨,pp.296-299。
[54]
许洁萍,贺琳,陆亚民,吕士楠,汉语广播言语中音节时长变化初探,第五届人机语音通讯学术会议论文集,1998,哈尔滨,pp.42-45。
[55]
初敏,唐涤飞,司宏岩,田旭青和吕士楠,汉语音节音联感知特性的研究,声学学报
,Vol.22, No.2, 1997, pp.104-110。
[56]
唐涤飞,贺琳、初敏和吕士楠,“联想佳音”汉语文语转换系统的应用,第八届语音图象通信信号处理学术会议论文集,1997,郑州。
[57]
初敏和吕士,一种高清晰度和高自然度的汉语文语转换系统, 声学学报,Vol. 21, 第4期增刊,1996,pp.639-647。
[58]
吕士楠,初敏,贺琳,陆亚民和李晓光,计算机汉语口语输出系统的设计与实现,软件学报,1996,863专刊,pp.53-59。
[59]
吕士楠,初敏,陆亚民,倪光南和李晓光, 中文DOS平台语音系统,第三届全国计算机应用学术交流大会论文集,1995,北京, pp.1558-1561。
[60]
陆亚民,吕士楠,初敏,贺琳和周同春,疑问句语调模型的研究,第三届全国语音通讯信号处理学术论文集,西安,1995,pp. 154-157。
[61]
初敏,司宏岩,田旭青,吕士楠和孔江平,汉语音节间的协同发音在听觉感知中的作用,第七届全国语音图象通讯信号处理学术会议论文集,1995,西安, pp.349-353。
[62]
吕士楠,初敏和李晓光,国内外语音合成技术的发展概况,第七届全国语音图象通讯信号处理学术会议论文集,1995,西安, pp.133-141。
[63]
初敏,吕士楠和陆亚民,利用基音同步叠加技术合成汉语的研究,第三届全国人机语音通讯学术会议论文集,重庆,pp. 394-397。
[64]
吕士楠,周同春,初敏和陆亚民, 汉语合成系统中音高和音长规则研究,第三届全国人机语音通讯学术会议论文集,1994, 重庆, pp.407-410。
[65]
初敏、吕士楠和周同春,汉语轻声音节合成规则研究,第六界全国语音图象通讯信号处理学术会议论文集,1993年,四川,pp. B9.107- B9.109。