Research in audio processing, speech recognition, language modeling, language understanding, spoken language systems, dialog systems
Overview
Microsoft Research has a group in Redmond and another in Beijing working together to improve spoken language technologies. Our main goal is to build applications that make computers available everywhere, and work with Microsoft Tellme to make this vision a reality. We are interested not only in creating state-of-the-art spoken language components, but also in how these disparate components can come together with other modes of human-computer interaction to form a unified, consistent computing environment. We are pursuing several projects to help us reach our vision of a fully speech-enabled computer. See a Flash overview of speech recognition at MSR (click in "Microsoft Research" and "Speech technology").
You can see a few videos that illustrate our technology and a list of our publications and downloads.
The speech group is managed by Alex Acero.
Projects
- Audio Processing: sound capture, speech enhancement, acoustic echo cancellation, de-reverberation, microphone array processing, loudspeaker arrays, spatial sound.
- Noise Robustness: How do we make the system work when background noise is present?
- Acoustic Modeling: How do we model phones and acoustic variations?
- Language Models using Recurrent Neural Network (RNN)
- Language Understanding. Not recognize the words the user says, but understand what they mean.
- Voice search. Users can search for information such as a business from your phone.
In the past, the speech technology group has worked on other projects, including:
- Automatic Grammar Induction: How do create grammars to ease the development of spoken language systems?
- (MiPad) Multimodal Interactive Pad. Our first multimodal prototype.
- SALT (Speech Enabled Language Tags): A markup language for the multimodal web
- Intent Understanding. Not recognize the words the user says, but understand what they mean.
- Multimodal Conversational User Interface
- Personalized Language Model for improved accuracy
- (Whisper) Speech Recognition. Our previous dictation-oriented speech recognition project is a state-of-the-art general-purpose speech recognizer.
- (Whistler) Speech Synthesis (Text-to-Speech). We have produced a speech synthesizer so that your computer can talk to you.
- (WhisperID) Speaker Identification . Who is doing the talking?
- Speech Application Programming Interface (SAPI) Development Toolkit. The Whisper speech recognizer can be used by developers to produce applications using speech recognition
Former members of the speech group in Microsoft
| Asela Gunawardana | aselag | Redmond |
| Kuansan Wang | kuansanw | Redmond |
| Hsiao-Wuen Hon | hon | Asia |
| XD Huang | xdh | Redmond |
| Mei-Yuh Hwang | mehwang | Redmond |
| Fil Alleva | fil | Redmond |
| Li Jiang | lij | Redmond |
| Mike Plumpe | mplumpe | Redmond |

Click for full size
The speech group in the press
- T. Bishop. Show and tell at Microsoft's annual research fest (Seattle PI, 2004).
- D. Barker. Microsoft Research Spawns a New Era in Speech Technology (PC AI Magazine, 2003).
- M. Kanellos. Talking Computers Nearing Reality (CNET News.com, 2003).
- M. Brooks. No one understands me as well as my PC (New Scientist, 2003).
