April 28, 2011 10:00 AM PT
Fat fingers and compact smartphone screens just aren’t a good match. Combine a few key misses with a well-meaning auto-correction feature and it’s no wonder that “I ate the hummus” turns into “I ate the humans.” But Windows Phone 7 users find that their keyboards seem to have a better knack for knowing what they intend to type—or even what they intend to say.
That’s because of collaboration between a Microsoft Research team and the Windows Phone 7 product group. They worked together to apply the principles of machine learning—in which technology works on a person’s behalf—to improving finger input on Window Phone 7’s “soft,” or virtual, keyboard.
Tim Paek, a researcher in the Mobile Computing Research Center, and Asela Gunawardana, a researcher in the Machine Learning and Applied Statistics group, led the effort for Microsoft Research. Their counterparts on the Windows Phone 7 team included Eric Badger, developer lead, and Itai Almog, program manager. The goal was simple, Badger says: Design a better mobile-phone keyboard.
“That was our starting premise,” he says. “We wanted to have the best text-input solution in the world. When it comes to digital communication, the task of getting an idea from your head to the keyboard is really important.”
But there were plenty of hurdles to reaching that “best keyboard” solution. Fortunately, Badger had worked on speech products, and he believed that the keypad puzzle was essentially a decoding problem that could be approached using machine learning. He had collaborated with Paek in the past, and the two had become friends. Badger also knew about Gunawardana’s research.
“I felt comfortable,” Badger says, “approaching Tim and seeing what he thought of the problem.”
The problem with smartphone keyboards, Almog says, is obvious.
“Their fingers are big, the keys are tiny, and the sensors aren’t always reliable,” he explains, “so what you end up with is a bunch of ‘noisy’ XY coordinates.”
The Microsoft Research and Windows Phone 7 teams tackled the problem by first conducting extensive user experiments on a Windows Phone 7 prototype. They collected data on how people touch soft keyboard keys, what it means when people employ different holding positions—using only their thumbs, for instance—and the strengths and weaknesses of competitors’ keypads.
Paek and Gunawardana decided to use their expertise in machine learning and linguistics to develop an approach that mathematically takes into account both the geometry of the keyboard space and what people seem to want to say. Badger and a colleague, Drew Linerud, created a prototype incorporating the approach to demonstrate the proof of concept. The prototype was instrumental in getting the business group to embrace the project.
The phone first uses statistical models of language patterns to determine what a user is trying to type.
“We know they are typing in English, usually,” Gunawardana says, “and we tend to know what the text is supposed to look like, even if what they’re writing is not exactly the same language you’d find in The Wall Street Journal. You usually can guess what people intend to say.”
Then the phone uses statistical models of which touch points are likely to be seen given that the user is trying to hit certain letters.
“That’s what we collected in our user experiments,” Paek says, “data about how far off users can go in trying to hit certain keys.”
For crowdsourcing, the data-collection tool eventually was turned into a game called Text Text Revolution! by Dmitry Rudchenko, a developer on Badger’s team. The game is rated at four out of five stars on the Windows Phone 7 Marketplace, and it not only helps users grow accustomed to typing quickly and accurately on their Windows Phone devices, but also generates ideal data for training statistical models as a side effect of playing the game. Since its launch, the game has collected more than 20 million touch points for training.
By combining statistical models of language patterns and touch points, the keyboard dynamically changes the virtual size of the likely next letter, so that it has a larger target area—the area where tapping the keypad results in a particular letter, symbol, or number.
“We don’t show that visually,” Paek says. “It all happens behind the scenes.”
The keypad software analyzes what a user is typing, decides which letter is most likely to be typed next, and enlarges the virtual key area, so that hitting a “T” results in a T, not a Y or an R.
In upcoming releases, the keypad even will take into account the speed at which a person is attempting to type.
“When you’re typing really fast with two thumbs, the touch patterns are sloppy,” Paek says, “so you have to make the target area even bigger.”
Most smartphones now take a stab at guessing the user’s intent. For Windows Phone 7, the trick was to make that guesswork better—without getting in the way of the user. A phone’s predictive software can be so powerful that it overrides what the user really wants.
“If you hit right smack in the middle of the ‘b’ key, I want the system to respect that,” Gunawardana says. “We came up with a model that allows the statistics to vary the key footprint, while still respecting the user’s intent.”
By modulating the certainty of its suggestions in that way, the software remains strongly predictive but doesn’t become so sure of itself that it thwarts the user.
The researchers also had to contend with a fact of life when working with smartphones: their relative lack of power.
“That’s what I spend a lot of time on,” Gunawardana says, “coming up with models and then squeezing the algorithms down so they had very small footprints and would run blazingly fast on these very small devices. You don’t want the phone to ‘hang’ every time it uses the keyboard.”
He used linguistic tricks, in part by using what he calls “back off,” in which the system reacts with varying levels of confidence about the user’s intent. They keyboard software, for instance, can confidently predict that what comes at the end of “keyboar” is almost certainly a “d.” After all, it’s seen that pattern thousands of times. But when it comes across a word that’s less familiar, it “backs off” from making firm assumptions. It might first suggest three or four letters as possible next steps. Or it might simply determine, “This letter is a consonant, so a vowel is most likely to come next.”
Reception to the keyboard has been great, Almog and Badger say.
“Not only have we received kudos from the press,” says Almog, pointing to consistently positive reviews, “but it’s also very satisfying to have individual people tell us how much they like the keyboard.”
Adds Badger: “The quality of the keyboard has been amazing, thanks in large part to those who designed the Splash user-interface framework and those who tuned and optimized the touch pipeline. We’ve gotten a lot of really positive feedback.”
The researchers have taken their approach and extended it to other languages. Paek and Gunawardana, along with Microsoft Research colleague Chris Meek, published a portion of their keyboard work in a paper titled Usability Guided Key-Target Resizing for Soft Keyboards, which was presented in 2010 during the International Conference on Intelligent User Interfaces. Paek, Badger, and Rudchencko also have published Text Text Revolution: A Game That Improves Text Entry on Mobile Touchscreen Keyboards, which will be presented during the 2011 International Conference on Pervasive Computing and Communications.