Commute UX is interactive dialog system for in-car infotainment. Using natural language input and user multimodal user interface the selection of desired song or person is easy and efficient. To mitigate the noisy conditions in the car we designed state of the art speech enhancement and sound capture system with microphone array.
In the cloud dialog system
Our initial step in this project was to create a telephone based dialog system on top of Microsoft Speech Server. The system was deployed in January 2007 and provided completely automated service for traffic, gas prices, gas station locations, weather and stocks.
This initial deployment allowed us to see the importance of location and time understanding and conveying. We analyzed the data from the calls and specially designed user studies and improved the task completion rate.
Another important factor for increasing the usability of such dialog systems is extensive usage of any prior information. Each of the telephone system users could go and register in the project web site, providing friendly names for the most frequently visited locations (home, work, Julie's school, etc.). This lead to reduction of number of dialog turns, and the users easier complete the desired tasks.
Audio recording of a telephone call illustrating these features (to hear click on the icon):
On-board dialog system
In November 2007 Ford Motors announced the availability in many Ford models the Sync system, built on top of Microsoft Auto platform. This voice enabled dialog system allows selection of song title or calling a person from the mobile phone address book. Such on-board systems are in transition from being a "cool gadget" to being integral part of the modern automobile. Driving is eyes busy and hands busy activity, the only wideband communication channel left for interaction with the on-board system is speech.
In the second phase of our project we developed a prototype of such dialog system. It allows (almost) free phrase speech commands and queries. The dialog system is flat and doesn't have multilayered and complex menu structure. Additional challenge is handling the non-exact queries for song/album selection. When not sure in the song name humans add additional information such as album or artist. Handling multiple fields in a single query increased the song title recognition rate. Important part is the tight integration of speech user interface with other user interface components such as graphics, touch screen and buttons into a unified multimodal human-machine interface. These components are complimentary - speech queries are very strong for browsing large lists, while the touch screen, for example, is more efficient for selection from short lists (menus, disambiguation).
The on-board computer usually has limited resources (CPU power and available memory), which poses addition challenges to the designed algorithms.
We believe that the in-car dialog systems for infotainment and communications have potential for further increase of the user satisfaction. Such systems should be built based on the following principles:
- Speech enabled - speech is the interaction channel left in eyes busy/hands busy activity such as driving.
- Multimodal - speech is strong in browsing large lists (music, address book), touch screen/buttons - for selection from short lists (disambiguation). Smooth transition from speech only to GUI/touch only interfaces based on the conditions and user.
- Situation aware - to mimic the behavior of the passenger and not to speak during passing, lane change, etc. Gradually to change the behavior based on speed, weather, and driving conditions - turn GUI off under heavy driving conditions to minimize the driver's distraction, for example.
- Context and person awareness - increases the usability of the system by filling defaults, coming from the context and habits of the driver. They are easy to obtain, monitor and store as the car has limited number of users (drivers).
- Seamless integration of in-car and in-cloud services - the connected to the cloud dialog system just gets smarter and provides more information and services in a smooth and nice way.
Commute UX in TechFest 2009
During TechFest 2009 we showed the on-board system prototype to Microsoft employees and invited journalists. The overall reaction was quite positive, our demo booth was the fourth most popular from more than 150 demos. It generated a lot of interest in the local, US, and world press. Here is a small portion of publications about Commute UX in the printed and online media:
Microsoft.com-February 24, 2009
At Microsoft Research TechFest 2009, Researcher Ivan Tashev, right, gives Craig Mundie, center, Microsoft’s Chief Research and Strategy Officer, and Rick Rashid, left, Senior Vice President of Microsoft Research, a glimpse of the future of driving with Commute UX, a research project designed to allow drivers to more easily interact with devices and information in their cars, using technologies such as speech and touch. Picture.
Microsoft.com-February 23, 2009
Microsoft Research Principal Architect Ivan Tashev demonstrates the in-car dialog system Commute UX, at TechFest 2009 in Redmond, Wash. Video.
Community.Research blog – February 25, 2009
Feature article about Commute UX. Article and picture.
Channel 10 -February 24, 2009
After deploying Blue&Me for Fiat and Sync for Ford, in-car dialog systems are morphing from cool gadgets that amaze people and sell more cars to integral parts of in-car infotainment. Commute UX raises the bar for the functionality, usability and reliability of these systems. Microsoft Research’s presents you with novel technologies that enable natural-language input, expose a multimodal user interface including speech, a GUI, touch and buttons; and use the state-of-the-art sound-capture and processing technologies for improved speech recognition and sound quality. Video.
The New York Times -February 25, 2009
Mike Seltzer of Microsoft Research demonstrates technology that allows drivers to interact with devices using speech and touch. Article and picture.
YAHOO.com-February 24, 2009
Ivan Tashev, a Microsoft Corp. researcher, uses a car simulator to demonstrate Microsoft's experimental "Commute UX" system, at TechFest 2009, Microsoft's annual showcase of research and future technology projects, Tuesday, Feb. 24, 2009 in Redmond, Wash. Commute UX uses advanced voice-recognition technology to allow drivers to make phone calls, dictate spoken text-messages, and control music and other entertainment options all while driving. Picture.
The driving simulator
One of the important requirements for on-board speech enabled and multimodal dialog systems is to minimize the driver's distraction. The way safely to measure the levels of the driver's distraction is by using simulated environment. We conduct user studies in our driving simulator. It is fully programmable, provides 180O field of view and seven channels surround sound. The program records multiple parameters, which allows objective evaluation of the driving quality.
A short video of our driving simulator in action was recorded by a student of Professor Andrew Kun from University of New Hampshire during their visit in Microsoft Research.
- Shamsi T Iqbal, Eric Horvitz, Y.C. Ju, and Ella Mathews, Hang on a Sec! Effects of Proactive Mediation of Phone Conversations while Driving, in ACM Conference on Human Factors for Computing Systems (CHI), Association for Computing Machinery, Inc., May 2011.
- Yun-Cheng Ju and Tim Paek, Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study, Association for Computational Linguistics, 11 July 2010.
- Ivan Tashev, Andrew Lovitt, and Alex Acero, Dual stage probabilistic voice activity detector, in NOISE-CON 2010 and 159th Meeting of the Acoustical Society of America, Acoustical Society of America, 20 April 2010.
- Yun-Cheng Ju and Tim Paek, How to Safely Respond to SMS Messages in Automobiles, in 2nd Multimodal Interfaces for Automobile Applications (MIAA), Association for Computing Machinery, Inc., 7 February 2010.
- Shamsi T. Iqbal, Yun-Cheng Ju, and Eric Horvitz, Cars, Calls and Cognition: Investigating Driving and Divided Attention, in ACM Conference on Human Factors in Computing Systems (CHI), Association for Computing Machinery, Inc., 2010.
- Ivan Tashev, Michael L. Seltzer, and Yun-Cheng Ju, Speech and sound for in-car infotainment systems, in Automotive User Interfaces and Interactive Vehicular Applications (AutomotiveUI 2009), Association for Computing Machinery, Inc., Essen, Germany, 22 September 2009.
- Ivan Tashev, Michael Seltzer, Yun-Cheng Ju, Ye-Yi Wang, and Alex Acero, Commute UX: Voice Enabled In-car Infotainment System, in Mobile HCI '09: Workshop on Speech in Mobile and Pervasive Environments (SiMPE), Association for Computing Machinery, Inc., Bonn, Germany, 15 September 2009.
- Yun-Cheng Ju, Michael Seltzer, and Ivan Tashev, Improving Perceived Accuracy for In-Car Media Search, International Speech Communication Association, September 2009.
- Yun-Cheng Ju and Tim Paek, A Voice Search Approach to Replying to SMS Messages in Automobiles, International Speech Communication Association, September 2009.
- yun-Cheng Ju and Tim Paek, A Voice Search Approach to Replying to SMS Messages in Automobiles, International Speech Communication Association, September 2009.