The Conversational Architectures project aims to build natural, human-like systems and interfaces for engaging in conversation with a computer. This involves exploring not only various theoretical frameworks for supporting human-computer dialog, but also computational architectures that apply these frameworks. To better equip computers for conversation, we have endowed systems with inferential and sensory capabilities that give them the ability to see what the user is paying attention to, recognize words, grammatically parse utterances, and even, recover in a natural, human-like manner from misunderstandings when they occur. To implement these capabilities, we have integrated leading-edge components in vision technology, speech recognition, text-to-speech generation, and natural language parsing, along with advanced machine learning techniques such as Bayesian inference for optimizing decisions over time.
Our efforts include research on:
Further developments are underway. We have applied these capabilities to handling receptionist tasks, navigating Microsoft PowerPoint, and directing clarification dialog in command-and-control speech recognition. For more information, see our Systems web page.
|
|
Latest News and Development |
Lately, we have been refurbishing the Bayesian Receptionist, a system designed to perform tasks typically handled by front desk receptionists at the Microsoft corporate campus. The Bayesian Receptionist serves two purposes: first, to provide receptionist services to users, and second, to provide a testing environment for new research developments. A prior version of the Bayesian Receptionist required users to manually input contextual observations wherever available. The new version attempts to automate the evidence gathering process and to integrate multi-modal information through the Quartet system.
To get an idea of how users interact with an automated system, and to build up a corpus of training data for computational modeling, we have been collecting spoken dialogs between users and a mockup system. Using a synthetic voice to maintain the illusion of communicating with a computer, a human "wizard" assists users in accomplishing various personal information management (PIM) tasks.