By Rob Knies
March 3, 2010 9:00 AM PT
Computers these days are remarkably responsive. Users can produce amazingly complex documents, store and sift through mountains of data, or communicate to vast, worldwide audiences at the push of a button. Today’s computing devices are so adept at performing such a myriad of useful tasks that, for some, it might seem that the capabilities computers possess have been thoroughly explored.
That, though, is hardly the case. Computers of the future—the near future, the next five to 10 years—will make today’s wonders pale into insignificance. For one thing, computers now excel at doing what the user tells them to do. But what if they didn’t have to wait?
What if your computer could anticipate your needs and act on your behalf?
Sound far-fetched, a figment of a science-fiction imagination? Don’t be too sure. In the current environment of exploding data collections, analysis and intelligent exploitation of such information could well presage a future in which your computer works while you sleep, tend to business elsewhere, or take a well-deserved recreational break.
Now, that would be a screaming-fast computer, one that completes a task before you even get around to thinking about it. One that could work on-the-fly, freeing your concentration to focus on other, more high-level priorities. One that aspires to the sort of productivity that might occur to dreamers or tech utopians.
A mere chimera, you say? An ambition beyond the bounds of technological innovation? Visitors to TechFest 2010, this year’s edition of Microsoft Research’s annual technology showcase, being held March 3 and 4, might beg to differ.
Step forward and take a peek into the future of computing:
You’re an American businesswoman, and you’ve been asked to confer with a Germany-based colleague. The most expedient way to do so would be via telephone, but you’ve been advised that the colleague doesn’t speak English, and you can’t converse in anything but English. Getting a translator is out of the question, and time is of the essence. What do you do?
Today, the options are slim. But before long, say a pair of researchers from Microsoft Research Asia, a solution could be on the way. Introducing The Translating! Telephone.
The project combines three key technologies: speech recognition, machine translation, and text-to-speech, and, as TechFest attendees will see, once witnessed, it’s pretty darned hard to conceal your excitement.
“The universal translator is one of those dream technologies that has always captivated minds,” says Kit Thambiratnam, a researcher in the Speech Group at Microsoft Research Asia. “It’s incredibly hard, though, requiring a combination of technologies that are far from perfect, particularly since we are trying to target free-flowing conversations, as opposed to dictation or structured speaking.”
While advances are still being made in the technologies involved, when they are deployed in tandem, they deliver an experience that is surprisingly robust.
“What we are showing,” Thambiratnam says, “is usable in the situation where there are multiple parties who have a vested interest to communicate.”
The scenario addressed by The Translating! Telephone demo is similar to the one suggested above. Thambiratnam, an English speaker, is contacting his manager, Frank Seide, lead researcher and research manager for Audio Information Management and Extraction—and a native German. They are using a voice-over IP (VoIP) telephone connection.
Seide’s German is recognized by a speech recognizer. Machine-translation technology from Microsoft Research Redmond translates the recognized German into English, which is then synthesized using English text-to-speech. Thambiratnam’s English is converted similarly into German, and voilà: a two-way phone conversation across language barriers.
One of the keys to making this work is that the transcription text is displayed simultaneously to both users. If something isn’t quite right, they can straighten things out in a hurry by slow repetition of the troublesome passages.
“The challenge for translation is that spoken language is different from the written text that the Bing Translator was designed for,” Seide explains, “so we pre-translate the spoken input to more resemble written style before sending it to the translation module.”
The system evolved from an English-to-English phone-call-transcription research prototype that provides live transcripts of VoIP telephone calls.
Such transcripts might be imperfect, but they do have the advantage of being storable, browsable, searchable, and amenable to cut-and-paste scenarios.
One advance that makes the system possible include the improving nature of text-to-speech technologies researched and developed at Microsoft Research Asia by Frank Soong, principal researcher and manager of the Speech Group, and his colleagues. They are making text-to-speech sound much more smooth and natural than before.
But the speech-recognition work for conversational speech pursued by Seide and Thambiratnam also plays a key role.
“The old adage ‘garbage in, garbage out’ holds true here,” Thambiratnam says. “Getting great transcripts is the most important part of this. A critical part is the use of machine-learning algorithms to automatically build more accurate personalized speech models for each user, which results in very significant improvements in accuracy.
“No effort is required from the users. They just make VoIP calls and talk, and the system will learn their voices and get better and better.”
Take into account improved technologies in all three parts of The Translating! Telephone project, and the potential for such an effort to change the way we communicate seems tantalizingly near.
“We aren’t quite there,” Thambiratnam stipulates. “The technologies are still not perfect. But we feel they are good enough for two people to communicate in their native languages, as long as they are willing to speak carefully and maybe occasionally repeat themselves.”
Seide and Thambiratnam are justifiably proud of what they’ve accomplished—and the possibilities inherent in their work.
“What I find coolest,” Seide says, “is that we translate the words as they come out of the recognizer, so when making the call from a PC with a screen, a user can read along with translated partial transcripts without having to wait for the end of the sentence.”
Thambiratnam ponders the impact such efforts could have on the dream of universal communication.
“This is a first step at realizing one of those grand challenges of computing,” he says. “The thing that excites me about this is that people just ‘get it.’
“It’s universally appealing to everybody. Who wouldn’t want such a system? A system like this that worked flawlessly would go a long way in uniting the world.”
You’re attending a party, and all your best friends will be there. Amid the frivolity and the backslapping, digital cameras are certain to appear, and you’re hoping you get plenty of shots of your own.
You get home, and, yeah, you’ve got your shots—but what about all those photos others took of you and various friends? Or what about the ones others took of your best friends? Some of your pals might have shared their photos online, but if so, the relevant ones would be distributed over multiple albums in multiple accounts, mixed in with huge collections of other shots. You could sift through those, friend-by-friend, album-by-album, but who has time for that? And what if the party was awhile back—will you remember everybody who was there?
Sound familiar? We’ve all been there, with little recourse. That, though, might be about to change, thanks to technology from the Israel Innovation Labs.
OneAlbum is a project that uses face-recognition and event-matching technology to retrieve automatically photos that would interest you from social networks and your friends’ online photo albums—and add them to yours.
“This,” said Eyal Krupka, principal research program manager for the Israel Innovation Labs, “includes photos of myself, my wife, and my kids, from the events I participate in, the places I like, and more. OneAlbum also goes further, by organizing ‘my album’—the photos from the same event are presented side-by-side, no matter where they come from.”
Given the popularity of digital photography, social networking, and online photo sharing, such functionality has the potential to capture the interest of millions of users—particularly given the lack of user intervention required to enhance their own growing collections.
“OneAlbum uses breakthrough technology to recognize which photos interest me by analyzing my own album,” Krupka explains. “It recognizes people that appear most frequently in my album, typically my family, and photos of events I took. Later, OneAlbum crawls shared albums in my network and finds the most interesting photos. The technology can filter top relevant photos out of hundreds of thousands in my social network.”
The technology redefines the concept of “my album” from referring to “photos I have taken” to “photos that interest me,” no matter where they might reside. Using face recognition—based in part on algorithms from Microsoft Research Asia—and event matching, OneAlbum works even if the albums or photos are not tagged.
The thusly enhanced photo album also can provide a corollary to the aforementioned party scenario.
“This can trigger ‘after party’ social interaction,” Krupka says, “between people who participated in the event.”
And he offers another interesting scenario in which OneAlbum could prove invaluable.
“When OneAlbum is first used, I might be surprised to find many of my kids’ photos that were taken over many years by other people,” says, Krupka, who has collaborated on the project with Israel Innovation Labs colleagues Igor Abramovski and Igor Kviatkovsky. “For example, my son was at a birthday party of his classmate a few years ago, and I could get photos of him that I was not aware of before. I also could get photos from a joint tour with friends I haven’t seen for a few years.”
Starting from technology developed by Microsoft Research Asia and Microsoft’s Live Labs, the Israel Innovation Labs built the new face-recognition technology.
“When we first thought about OneAlbum,” he says, “we knew we needed to build a new technology and that this new technology had to be built on top of current, state-of-the-art face-recognition technology. Initially, we were afraid we would first have to invest much time to achieve the current state of the art before we could even start working on our new algorithms. We were very happy to find that Microsoft Research Asia and Live Labs already had developed state-of-the-art face recognition. The researchers from Microsoft Research and Live Labs willingly shared their technology and code, and we were able to immediately start our research based on the state of the art.
“To me, as a researcher, this is one of the great things about Microsoft: You have access to state-of-the-art technology and experts, and you can start your research from there.”
Given the near universal appeal of the OneAlbum technology, Krupka is delighted that his algorithm can do the heavy lifting of finding relevant photos without relying on tagging—and that it can learn his own interests by analyzing his digital-photo album. But it’s the very redefinition of that term that brings a smile to his face.
“First,” he concludes, “‘my album’ was printed photos on a physical album on my shelf. Next, ‘my album’ referred to a collection of photos I took and stored on a photo-sharing site.
“OneAlbum goes to the next generation: Now, ‘my album” consists of all photos that interest me, regardless of the photographer.”