Speech Recognition, Synthesis, and Dialog Systems

Teaching computers how to both speak and listen


Using speech to communicate continues to be the most natural, easy way to exchange ideas and thoughts between people. The challenge, though, becomes greater when communicating with computers—or communicating with other people using computing as an intermediary. We are working to develop spoken-language technologies that enable human-computer voice interaction and enrich human-to-human voice communications.

Our current focus includes automatic speech recognition to enable computers to facilitate access to data, help create content, and perform tasks; speech synthesis to enable computers to speak with a human-sounding voice, to respond and provide information, and to read; spoken-document retrieval and processing to enrich communication between people, such as converting voicemail into text; and signal processing to improve the conditioning of signals, change speech-signal parameters such as pitch, speaking rates, and voice characteristics seamlessly. We are pursuing several projects to help us reach our long-term vision of a fully speech-enabled computer.

In dialog systems, we pull these component technologies together and augment them with algorithms that can reason about a user’s intentions to act in a helpful way. Our dialog systems have applications in cellphone-based personal assistants, gaming systems and technical support.

To learn more about this area of research, including related projects, videos and publications, please visit the Speech and Dialogue Research Group and the Beijing-based Speech Research Group.