"This is fantastic to see. It does make me feel good on my last day!" wrote Bill Gates, in a personally written reply to the handwriting recognition team at Microsoft Research Asia right before officially stepping down from full-time work at Microsoft. The result that Gates was referring to was East Asian Language Recognition Technology jointly developed by the User Interface Group (currently the Software Analytics Group with new research focuses) at Microsoft Research Asia and the Windows Experience team. By the time Gates wrote his email, post-optimization of the East Asian Language Recognition Technology Project had already been accomplished at the Sigma Building in Beijing that houses Microsoft Research Asia.
Back in March 2007, when the East Asian Language Recognition Technology Project had barely celebrated its first birthday, Gates had expressed his great interest in and expectations for the project in his correspondence with Microsoft Research Asia. Copies of the two e-mails from Gates still remain in the inboxes of all members of the project team.
The two e-mails span the entire process during which the East Asian Language Recognition Technology was developed and incorporated in localized versions of Windows 7. One might easily imagine the happiness and excitement felt by Gates — a person extremely obsessed with technology. Once again, Microsoft Research Asia has injected a valuable piece of the lab’s wisdom into the company’s core products. Through the effort of Microsoft Research Asia, Windows 7 was able to make a revolutionary leap in helping speakers of East Asian languages enjoy greater simplicity and practicality when taking handwritten notes with their computers.
No other “word game” in the world could have been more challenging than the development of handwriting recognition technology at Microsoft Research Asia.
"As far as input is concerned, we cannot impose requirements on users; instead, we try to satisfy them. Users’ writing habits — including order of strokes and shapes of characters -- vary greatly. We therefore have to consider and cope with these factors as much as we can," said Shi Han of the Software Analytics Group at Microsoft Research Asia.
One of the most important focuses for research at the Software Analytics Group (formerly the User Interface Group) has been applications based on data-driven machine learning and pattern recognition – simply put, teaching machines to classify massive data from real life. In essence, the handwriting recognition technology transferred to Windows 7 was designed to address classification problems – to let the computer know what characters its user is entering. Unlike the Latin family and other western languages, languages in East Asia usually have extra-large character sets that include widely varying strokes and characters similar to each other, which are significant barriers to both the rate and accuracy of recognition.
According to Han, the most difficult part of single-character handwriting recognition appears when the user writes in haste. At the time when this project was launched, the industry had reported an average recognition rate of 95 percent for sets of written characters. The single-character recognition at that time mainly depended on spatial information about the shape of a character, and tells one character from another by mining for distinctions in their partial and holistic features. The advantage of information from spatial relations is that it describes the overall structure of a character, but ignores such details as the order and direction of strokes within a character. If information about the time sequence of strokes is also taken into account, it would be easier for the system to precisely distinguish “味” from “昧.”
During the two solid years they spent preparing for Windows 7, the handwriting recognition team brought the recognition rate for East Asian languages to a higher level – for sets of characters in Simplified Chinese, for instance, the recognition rate was raised to above 97 percent.
Though recognition rate has increased, the system still needs to operate more efficiently and be more compact. The handwriting recognition in Windows 7 is capable of full-sentence input, error correction and next-character prediction. An understanding of context and language models built on massive statistics of character combinations play an important role in these features. East Asian languages usually have huge character sets, which present a tremendous challenge. The Chinese language, for example, has more than 20,000 items in its complete character set, and items in its most commonly used primary and secondary subsets also add up to nearly 7000. One can hardly imagine the size of a model that also contains words and phrases derived from these characters.
“Faster and smaller” was one of the important goals in the development of Windows 7, so how was the team able to select the most effective portion of such a huge model to increase recognition rate and to improve user experience? Han said the language model prior to the project had already been highly optimized, but was still considerably large. Thanks to further effort by the handwriting recognition team, these models were reduced to half their original sizes, with reasonably improved recognition rates for the “full sentence input” mode.
"Almost all of our experiments and codes were made to product standards so that the entire technical transferring process would be very smooth. We considered several factors here: in order to turn a good technology into code, you need to be very familiar with the technology itself, and at the same time you need to optimize the code to make it smaller and faster. So we were in the best position to do an efficient job,” said Han. “The only requirement is that people on our side be equally competent in research and development." Han and his colleagues still take great pride in the fact that not a single bug was found in the delivery test, which could be called a miracle from the perspective of basic research if such feats weren’t taking place literally every day at Microsoft Research Asia.
This was another classic case of collaboration between teams in that the qualitative breakthrough of East Asian language recognition in Windows 7 would not have been possible without contribution from the handwriting recognition team of the Windows Experience Group at Microsoft headquarters in Redmond. The close collaboration between research facilities and product departments is reflected in countless cases of this kind at Microsoft.
The handwriting recognition team in the User Interface Group (currently the Software Analytics Group) finished transferring the Hidden Markov Model (HMM) based on East Asian Language Recognition Technology to the Windows Experience (WEX) team in July 2008. Major breakthroughs and technological innovations were made in HMM topological design, optimized roots set selection, HMM differentiation training, model compression based on shared state parameters, and data-driven decoding acceleration, all of which targeted East Asian languages. This transfer instilled state-of-art handwriting recognition technology into Microsoft products and brought the accuracy of East Asian Language Recognition to a new level. With high recognition accuracy, compact size and excellent performance, the HMM based East Asia Character Recognizer (code name Dolphin) developed by Microsoft Research Asia contributed significant relative error reduction to the four East Asian languages – Simplified Chinese, Traditional Chinese, Japanese and Korean. Prior to this, language model optimization for full-sentence handwriting recognition in East Asian languages had already been transferred in the M3 stage of Windows 7. The final product, which has better integration of optimized language models, records reasonably higher accuracy in full-sentence recognition in Simplified Chinese, Traditional Chinese and Japanese.
As far as handwriting recognition technologies are concerned, characters in East Asian languages are still far from perfect, and the computer is only the beginning for Microsoft as it looks to develop technology on mobile phones and television sets.
A new challenge has emerged with Chinese from the language habits of young people. Active on-line users tend to input text using a mixture of Chinese and English, and some people are playing around with facial expression symbols or even so-called “Martian.” “We used to develop a separate model for each language, so this could be a major challenge from the single-model perspective. However, when it comes to products or technical applications, we can still improve them by integrating more language models,” said Han, adding that languages, including Arabic and others that have been intensively discussed in the academic community in recent years, have their unique features as well as user groups. As a leader in the software industry, Microsoft has a responsibility to provide satisfying solutions for users of all languages. “After all, our goal is to serve and to facilitate life of all citizens on this planet so that more people will benefit from our products."
It is necessary to take into account people's usage experience and their habits in different environments. All content that is not convenient for keyboard input will become exciting challenges for Microsoft. “For example, we had looked into possible solutions for the input of mathematical and chemical formulas. Chemical formulas are still not that easy to key in, especially complex organic structures as one may find in medicine instructions. Pen-and-paper still makes a most convenient solution if one has to quickly put down a design sketch or an instant inspiration. Of course, recognition technology would be a must in case you want to have it digitized for easier management or further processing."
In the field of mobile telecommunications, the realization of handwriting recognition no longer relies solely on technologies. One has to find ways to better apply existing recognition technologies, including support for hardware innovation and more convenient human-computer interaction designs. Handwriting recognition technology will not be limited to text, nor will it be exclusive to Windows 7.
The rapid development of information technology has brought new challenges and opportunities alike. Having successfully transferred many technologies, including handwriting recognition, to final products, the original User Interface Group at Microsoft Research Asia also strategically shifted to uncharted and more risky areas, which has resulted in the inauguration of the Software Analytics Group. There is a long journey between an initial idea and a final product. But it’s hard not to look forward to more research results from Microsoft Research Asia being transferred to marketable products and eventually into our daily lives.