Research in speech recognition, language modeling, language understanding, spoken language systems and dialog systems
Our goal is to fundamentally advance the state-of-the-art in speech and dialog technology. To achieve this, we are working in all aspects of machine learning, neural network modeling, signal processing, and dialog modeling. Recently, to support our work, we have developed the Computational Network Toolkit (CNTK), which makes it easy to define complex neural network structures, and train them across multiple GPUs with unprecedented efficiency. You can find out more about this work by exploring the projects and individual home pages listed below.
In addition to advancing our basic scientific understanding of natural language processing, our work finds an outlet in Microsoft products such as Cortana, Xbox, and the Project Oxford web services suite. We have developed two of the key services. LUIS (Language Understanding Intelligent Service) makes it very easy for a developer to add language understanding to applications. From a small number of examples, LUIS is able to determine a user's intent when they talk or type. CRIS (Custom Recognition Intelligent Service) provides companies with the ability to deploy customized speech recognition. The developer uploads sample audio files and transcriptions, and the recognizer is customized to the specific circumstances. This can make recognition far better in unusual circumstances, such as recognition on a factory floor, or outdoors. At runtime, both LUIS and CRIS are accessed via web APIs.
The Speech & Dialog Group is managed by Geoffrey Zweig.
- Acoustic Modeling: How do we model phones and acoustic variations?
- Dialog and Conversational Systems: How to model interaction between systems and users?
- Language Models using Recurrent Neural Network (RNN)
- Language Understanding: Don't just recognize the words a user spoke, but understand what they mean.
- Meeting Recognition and Understanding: Make meetings more useful using speech recognition and understanding technology.
- Noise Robustness: How do we make the system work when background noise is present?
- Voice search. Users can search for information such as a business from your phone.
In the past, the speech technology group has worked on other projects, including:
- Automatic Grammar Induction: How do create grammars to ease the development of spoken language systems?
- (MiPad) Multimodal Interactive Pad. Our first multimodal prototype.
- SALT (Speech Enabled Language Tags): A markup language for the multimodal web
- Intent Understanding. Not recognize the words the user says, but understand what they mean.
- Multimodal Conversational User Interface
- Personalized Language Model for improved accuracy
- (Whisper) Speech Recognition. Our previous dictation-oriented speech recognition project is a state-of-the-art general-purpose speech recognizer.
- (Whistler) Speech Synthesis (Text-to-Speech). We have produced a speech synthesizer so that your computer can talk to you.
- (WhisperID) Speaker Identification . Who is doing the talking?
- Speech Application Programming Interface (SAPI) Development Toolkit. The Whisper speech recognizer can be used by developers to produce applications using speech recognition