Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Commute UX
Commute UX

Commute UX is interactive dialog system for in-car infotainment. Using natural language input and user multimodal user interface the selection of desired song or person is easy and efficient. To mitigate the noisy conditions in the car we designed state of the art speech enhancement and sound capture system with microphone array.

In the cloud dialog system

Our initial step in this project was to create a telephone based dialog system on top of Microsoft Speech Server. The system was deployed in January 2007 and provided completely automated service for traffic, gas prices, gas station locations, weather and stocks.

This initial deployment allowed us to see the importance of location and time understanding and conveying. We analyzed the data from the calls and specially designed user studies and improved the task completion rate.

Another important factor for increasing the usability of such dialog systems is extensive usage of any prior information. Each of the telephone system users could go and register in the project web site, providing friendly names for the most frequently visited locations (home, work, Julie's school, etc.). This lead to reduction of number of dialog turns, and the users easier complete the desired tasks.

Audio recording of a telephone call illustrating these features (to hear click on the icon):

On-board dialog system

In November 2007 Ford Motors announced the availability in many Ford models the Sync system, built on top of Microsoft Auto platform. This voice enabled dialog system allows selection of song title or calling a person from the mobile phone address book. Such on-board systems are in transition from being a "cool gadget" to being integral part of the modern automobile. Driving is eyes busy and hands busy activity, the only wideband communication channel left for interaction with the on-board system is speech.

In the second phase of our project we developed a prototype of such dialog system. It allows (almost) free phrase speech commands and queries. The dialog system is flat and doesn't have multilayered and complex menu structure. Additional challenge is handling the non-exact queries for song/album selection. When not sure in the song name humans add additional information such as album or artist. Handling multiple fields in a single query increased the song title recognition rate. Important part is the tight integration of speech user interface with other user interface components such as graphics, touch screen and buttons into a unified multimodal human-machine interface. These components are complimentary - speech queries are very strong for browsing large lists, while the touch screen, for example, is more efficient for selection from short lists (menus, disambiguation).

The on-board computer usually has limited resources (CPU power and available memory), which poses addition challenges to the designed algorithms.

Our vision

We believe that the in-car dialog systems for infotainment and communications have potential for further increase of the user satisfaction. Such systems should be built based on the following principles:

  • Speech enabled - speech is the interaction channel left in eyes busy/hands busy activity such as driving.
  • Multimodal - speech is strong in browsing large lists (music, address book), touch screen/buttons - for selection from short lists (disambiguation). Smooth transition from speech only to GUI/touch only interfaces based on the conditions and user.
  • Situation aware - to mimic the behavior of the passenger and not to speak during passing, lane change, etc. Gradually to change the behavior based on speed, weather, and driving conditions - turn GUI off under heavy driving conditions to minimize the driver's distraction, for example.
  • Context and person awareness - increases the usability of the system by filling defaults, coming from the context and habits of the driver. They are easy to obtain, monitor and store as the car has limited number of users (drivers).
  • Seamless integration of in-car and in-cloud services - the connected to the cloud dialog system just gets smarter and provides more information and services in a smooth and nice way.

Commute UX in TechFest 2009

During TechFest 2009 we showed the on-board system prototype to Microsoft employees and invited journalists. The overall reaction was quite positive, our demo booth was the fourth most popular from more than 150 demos. It generated a lot of interest in the local, US, and world press. Here is a small portion of publications about Commute UX in the printed and online media:

Craig Mundie and Rick Rashid driving with Commute UX

Microsoft.com-February 24, 2009

At Microsoft Research TechFest 2009, Researcher Ivan Tashev, right, gives Craig Mundie, center, Microsoft’s Chief Research and Strategy Officer, and Rick Rashid, left, Senior Vice President of Microsoft Research, a glimpse of the future of driving with Commute UX, a research project designed to allow drivers to more easily interact with devices and information in their cars, using technologies such as speech and touch. Picture.

Riding Along With Commute UX

Microsoft.com-February 23, 2009

Microsoft Research Principal Architect Ivan Tashev demonstrates the in-car dialog system Commute UX, at TechFest 2009 in Redmond, Wash. Video.

Managing Audio in Wheel Time

Community.Research blog – February 25, 2009

Feature article about Commute UX. Article and picture.

TechFest: Commute UX: The next level of in-car infotainment

Channel 10 -February 24, 2009

After deploying Blue&Me for Fiat and Sync for Ford, in-car dialog systems are morphing from cool gadgets that amaze people and sell more cars to integral parts of in-car infotainment. Commute UX raises the bar for the functionality, usability and reliability of these systems. Microsoft Research’s presents you with novel technologies that enable natural-language input, expose a multimodal user interface including speech, a GUI, touch and buttons; and use the state-of-the-art sound-capture and processing technologies for improved speech recognition and sound quality. Video.

Microsoft’s Fight Against Fat Fingers

The New York Times -February 25, 2009

Mike Seltzer of Microsoft Research demonstrates technology that allows drivers to interact with devices using speech and touch. Article and picture.

YAHOO! NEWS

YAHOO.com-February 24, 2009

Ivan Tashev, a Microsoft Corp. researcher, uses a car simulator to demonstrate Microsoft's experimental "Commute UX" system, at TechFest 2009, Microsoft's annual showcase of research and future technology projects, Tuesday, Feb. 24, 2009 in Redmond, Wash. Commute UX uses advanced voice-recognition technology to allow drivers to make phone calls, dictate spoken text-messages, and control music and other entertainment options all while driving. Picture.

The driving simulator

One of the important requirements for on-board speech enabled and multimodal dialog systems is to minimize the driver's distraction. The way safely to measure the levels of the driver's distraction is by using simulated environment. We conduct user studies in our driving simulator. It is fully programmable, provides 180O field of view and seven channels surround sound. The program records multiple parameters, which allows objective evaluation of the driving quality.  

A short video of our driving simulator in action was recorded by a student of Professor Andrew Kun from University of New Hampshire during their visit in Microsoft Research.

Publications