For over twenty years scientists have been dreaming about creating a real P.A.D.D., the slate device that the inhabitants of Star Trek used to record and access data as they moved around the starship Enterprise.
There have been attempts to duplicate it over the years, but the Tablet PC may be the first successful incarnation. Some of this is timing - consumers are demanding more from their computers, and they want what the Tablet offers. The other reason is research - years of hard work and data gathering have made the Tablet PC possible.
Tablet PC is an evolution of the portable PC. It takes the best from a standard laptop and adds features that make retiring your laptop one of the smartest ideas you've ever had. To start with, it uses multi-modal input - you can input with keyboard, pen, or voice. While you may be committing a social faux pas by burying your head behind a computer screen as you click on a keyboard during a meeting, you will feel perfectly comfortable taking notes in your own handwriting on the Tablet PC.
Even better, the handwriting recognition in Tablet PC can take your handwriting (as long as your writing doesn't resemble chicken scratches) and transform it into digital text. You can then search your handwritten notes.
One Stroke, Two Strokes, Three Strokes and More
Handwriting recognition in the Tablet PC will be a boon for Asian consumers. Chinese and Japanese are pictorial languages with thousands of characters - it is a Herculean task to input these characters into an electronic document.
A character in Chinese can be a pictograph, or composite themes that you group together, what we would call letters. But all of the characters are made up of individual strokes, which is similar to our printing, but quite different from English cursive handwriting. In some ways, Asian writing is simpler to recognize, since the boundaries between strokes are clearer than the boundaries in cursive writing.
Microsoft researchers Patrice Simard, Chris Meek and Bo Thiesson have developed methods to improve recognition of Asian characters for the Tablet PC. Simard has worked on one and two stroke characters, while Meek and Thiesson have developed algorithms to improve three and four stroke characters.
At one time Simard held the record for the most efficient recognition of English. "There's a database that's been tried and tried by many people, using many algorithms, at many different companies, and for a short time I had the world record. And now they're all equivalent, they're within five errors on 10, 000," said Simard.
He asked to try his algorithms on the database that the handwriting recognition group at Microsoft had built. He was willing to devote the resources of a full-time software developer to the project, which impressed the product team. They also assigned a software developer to work on the new algorithms.
"Despite the fact that they already had a fully working and high-performance system, they were still open to trying new ideas from other groups. This is a lot harder than it seems," said Simard. The researchers from MSR and the handwriting product group joined efforts to build a new recognizer.
Simard felt that since his method approached the recognition problem from a completely different angle, the results would show different errors, therefore improving the results.
"The handwriting group's solution was relying on time information, such as stroke order. My method was going to throw out all the timing information, and therefore my errors would be completely different. I said, you've been working on this a very long time, and I don't hope to beat your system. But if my errors are different then we can combine the two systems," said Simard.
The methods developed by Simard, Meek, and Thiesson worked so well that they will ship in the first version of Tablet PC.
Any Which Way, Even Up
One of the most exciting features of Tablet PC is Microsoft's Journal application. It allows you to capture your thoughts on the computer in rich digital ink, just as though you were handwriting on a piece of paper.
This note-taking application worked well during the first stage of development, but it had one severe problem. The application allowed users to write in neat horizontal lines across the page, from left to right, in perfect grade-school precision. However, once we're released from grade school we stop following the rules and begin writing sideways, vertically, in circles, in the margins, all over the place. The developers knew that they had to find a way to allow the user to write any which way they pleased.
Fortunately, around the time the group discovered the problem, they also discovered several resources. The first was the team at MSR Asia. They presented their work on ink processing technologies to Bill Gates and Alex Loeb, the Vice President of the Tablet PC group. One of the technologies they demonstrated was their method to support skewed writing in note-taking instead of neat horizontal writing. The method was an intial stab at discovering how to write freeform digital notes.
A few months later, Michael Shilman, a PhD student at UC Berkeley, came to work for Tablet PC as an intern. A repetitive strain injury from typing too many lines of code forced him to switch to drawing diagrams on whiteboards, which sparked his interest in sketch recognition.
The product group steered Shilman to Simard. "I had a formulation for the problem and I had a high-level idea of how I was going to solve it, but I hadn't really thought about the details," said Shilman.
Simard had been thinking about the details for several years.
"It turns out that I'd been working on parsing and layout analysis for printed text. So they came to me and said, 'could we do something with cursive?' I gave them some ideas on how to do the algorithms."
Another problem with the using a computer as though it were a pad of paper is what's known as "reflow."
"People do really strange things," notes Simard. He says that they will write numbered lists by writing the numbers first, and then write text next to the numbers. They also like to scribble out lines, insert extra words, and use doodles or drawings to explain their text. All of these changes require the text to reflow to keep the integrity of the document. To do this the computer needs to know the difference between text and drawings, which is no easy task.
Though we can glance at a page of written material and know instantly what is text and what is a doodle, the machine is less competent at this. If someone draws a stick figure next to paragraph of text, and the machine doesn't recognize it as a drawing, and then you insert a few words next to the doodle, the machine might try to reflow everything, including the stick man. "If you start reflowing the stick guy, the legs and arms will go everywhere, so you have to detect what is a picture and what is text. It's a classification and a grouping problem," explains Simard.
Shilman began to implement the ideas that he and Simard had brainstormed. "I spent the summer working on those. By the end of the summer I had made enough progress to know that it was going to work," said Shilman.
"Given how ambitious their goals were, and the timeframe, I was very skeptical that they could succeed," said Simard. "But they worked extremely hard. And they proved that it was doable, which I didn't even know. I met with Michael about once a week and followed the process, but they did all the work," explains Simard.
Shilman collaborated with Zile Wei from Microsoft Research Asia. While the researchers in Redmond were looking at the problem from a document layout analysis perspective, the researchers in Asia had taken a different approach to the problem.
"Their approach was to look at individual strokes to distinguish between writing and drawings, where our approach was to look at entire lines of strokes. They also used stroke timing information to group strokes into lines, where ours was purely spatial. So it was pretty much different in every way that you could slice the problem. When we combined the best aspects of our approaches everything just started to work - it was really exciting," said Shilman.
Shilman and Wei worked with Sashi Raghupathy to turn the research into a product. "Her rich experience in research helped her manage what was essentially an ongoing research project in the short timeframe of a seven-week development period. Looking back it is amazing," notes Shilman.
Although the team is proud of their innovative work, Shilman likes to add a caveat. "It has yet to achieve human accuracy. To really nail the problem, to solve it 100% is impossible." To have the text always reflow perfectly, no matter what the text is or how the text is written, is an artificial intelligence problem.
"There's a phrase for this type of problem - AI Complete - which basically means that in order for the computer to solve it, the computer must be intelligent and understand all the context and subtleties of the world," said Shilman.
Make it small, make it very, very small
Tablet PC isn't just a hyped up laptop. It does things that have never been done before. For example, one of the limitations of the computer screen is our limited ability to interact with it - to manipulate it the way we do a printed page.
Let's say you write a spec for your new product, and then you hold a meeting to review it with your team. As the comments fly fast and furious around you, you try to jot down as many of them as you can, writing in the margins or drawing circles or giant x'es on your printed document. You get the document back to your desk, and attempt to translate the doodles and text annotations back into the original document. Not fun.
The Tablet PC will allow you to mark up your digital document and keep the annotations in digital format. Raman Narayanan's team was tasked with developing this feature for the Tablet.
"We had a first working version that was pretty decent given existing technology. But the feedback from the people who used it was that rendering was too slow and that the files were too big." People needed to send the files across the network, and the size savings offered by conventional compression techniques was insufficient.
"So that's when we started looking at the things that research is doing," said Narayanan. Narayanan approached two researchers at Microsoft, Simard and Rico Malvar, the research manager for the Communication, Collaboration, and Signal Processing group.
The techniques that Malvar and Simard developed significantly reduced the file size. "We got it down to about a tenth of the size we had with the existing compression, which was amazing," said Narayanan. "We've basically come up with two compression techniques from research that will give us dramatically better results then anything that is out there."
"Once you start converting documents to images you have to watch out because you want these images to have enough resolution. So what we needed was a better file format," said Malvar. They started with the technology used to fax documents, though they had to tweak it quite a lot.
"The trick is that fax machines don't have much resolution, and more often than not sometimes you get an unreadable document - black blobs. There's not enough little pixels," explains Malvar. "It was made twenty something years ago when people thought, I'll be sending these things through the telephone, it's going to be 9,000 bits per second and I can only compress this much, so I can't have more than 200 pixels per inch. That's the actual number. That's not much. There are screens that are getting close to that resolution. And a normal printed document has 600 pixels per inch, which is like a factor of three up from a fax image. So, you can say, let's put more pixels in. If I put in 300, 400, or 600 pixels, the image looks better but now I have too much to store.
The challenge was to be able to take a document, any document - a web page, a power point presentation, a legal brief, and allow it to be digitally annotated. To do this the document needed to be treated as an image, but high-resolution image files are huge. The researchers decided to look at it as two separate problems.
The traditional format to encode pictures is JPEG. "What we did is we came up with a form that is better than JPEG. It's about 50% better. So if a picture is a one megabyte JPEG our format would be 600 kilobytes. The idea is that our form has a bunch of features that JPEG doesn't have. If you're using a JPEG than the system under the hood has to decode all the images, compute all the pixels that were in the original image and from that produce a smaller version that then gets displayed.
"With our format you can decode just the pixels that you want. And that's very hard to do in JPEG, you can't do it. You'd have to decode first and recompute a small version of this. And this takes a lot of time, that can be a big difference. So where we gain now is that in some scenarios the processing time or the transmission time will be much less when you only want a portion of the data. That's what we bring is the ability to access a portion of the data instead of the entire thing."
Then Malvar and Simard got together to solve the text compression problem. They knew that text characters could be recognized by the machine and encoded as a complete image, as opposed to trying to encode each individual pixel. "In faxes, all the pixels of an 'e' are encoded independently of other occurrences of 'e's. The new technology clusters identical blobs of ink together and encodes their occurrence and position separately. The ink of identical blobs is only encoded once, while the positions are compressed more effectively using layout analysis. When we did this, we got file sizes that were ten times smaller," said Simard.
The combined compression schemes achieved the smaller file size that Narayanan's group needed to make digital annotation on the Tablet PC a reality.
Thinking In Ink
Charlton Lui, a development manger in the Tablet PC group says that what is revolutionary about the Tablet is the ink technology. "Currently people take notes with pen and paper. But what we've done is take that experience and put that onto the Tablet PC and allow you to 'think in ink' and have the ability to write in your own handwriting."
He credits researchers in Microsoft Research Asia for helping make the ink technology possible, in particular Jian Wang, the manager of the Multimodal User Interface group.
Harry Shum, who manages three groups at MSRA, is enthusiastic about the work that Wang has done for the Tablet PC. "Jian is quite a special person. He has a very interesting background. By training he is a psychologist. He was one of the high profile local hires that we had in China. He was the Chairman of the Department of Psychology in a very good university in China.
"At the same time Jian also had a pretty good computer science background. He has a very good sense of doing things; it's quite unique."
Wang's passion for his work is obvious. "This is the greatest project I've done in my life," he said.
"I started thinking about digital ink when I joined Microsoft Research Asia about two and a half years ago.
"Ink has its own reason to exist because it's so different from the text. That's the basic idea. Recognition wants to change the ink into text. So I think we should have the technology that keeps the ink as ink. I had this idea right after I joined Microsoft. And I visited the Tablet PC group. We had the first meeting and I think that both my group and the Tablet group were really excited about ink technology," said Wang.
"You'll have the ability to actually manipulate your own handwritten ink," said Lui. "You will be able to select it and parse the words and know what words are. People can take that recognized text and do something with it, put it in context and send it in email or do whatever they want to do with it. One of the things that MSR Asia has brought to the table is actually helping us with that, with the ink semantics of the parsing."
"The one thing that we've done is emphasize is the ability to think in ink because people draw or write, and about 95 percent of the time they'd just like to leave it in their handwriting and it's just fine, it's very useful," said Lui. "Sometimes it tries to recognize something and if they write sloppy it might not come out. If you type something, it's a typo and if you write something wrong, it really should be a writeo. If someone writes sloppy, there's nothing we can do about it. There will never be a time where if you cannot recognize your own writing on the page that the recognizer will be that intelligent.
"What's great is we have multiple input methods. You can have either a hunt and peck soft keyboard; which is 100% accurate, if you hit it wrong you think you did a pointo." Lui grins. "Or you have the single character recognizer which is more laborious then just writing it out, but it still is accurate. And we also have speech in the Tablet. And we made it to where you could switch between all of them. And the user will be able to do that so if they end up speaking and it doesn't quite work they can correct it with writing. They might even write and they can correct it with speaking. All that is built into the Tablet," said Lui.
The technology also has the ability to search your handwritten notes using a "fuzzy find." Alex Gounares, a software architect for Tablet PC, explains what fuzzy find does. "Fuzzy find is the feature of the Journal utility where a user can do a search through handwritten documents. So for example, I could search for the word "research" in all of my Journal notes, and the find feature would show me which documents and where I had used that word, even if I had written it in ink. "Fuzzy" refers to the set of heuristics and algorithms we employ to do that search."
The ink technology is going to be one of the most innovative new technologies in the next few years. It's important enough that MSRA is devoting even more resources to improve upon the current version. "We just recently formed a special project mainly for the ink technology," said Bin Lin, the development manager for the Advanced Technology group. "And the goal is to make ink become the first class citizen, not just something that people may want to use. So it will be like text."
Wang has high standards for the future of the Tablet PC. "I really want to make sure that the Tablet PC will be easy to use. You never have to think about how to sit on a chair. We should have a computer device that people never have to think about how to use, they just pick it up and use it in a very natural way."