By Hui Ma, China Internet Weekly
February 20, 2009 4:00 PM PT
“It is no exaggeration to say that I’ve waited a long time for Engkoo,” says Xiaohua Liu, an associate researcher at the Natural Language Computing Group of Microsoft Research Asia. “I am a Chinese person working at Microsoft Research Asia, where English is the daily language. I can get by when it comes to spoken English, but it is a totally different story when it comes to writing e-mails, preparing presentation documents, or compiling research papers. Those tasks require that I write the type of pure and colloquial English that Americans use.”
The sunny smile of Matthew Scott, who sits next to Liu, reminds me of the beaches of Hawaii. Scott is not from Hawaii, though, he’s from New York, and he’s a development lead within the Innovation Engineering Group of Microsoft Research Asia. He’s also the head of the Engkoo project. Scott doesn’t understand much of the Chinese dialogue exchanged between Liu and me, which might cause one to wonder why Microsoft would hire a Westerner with no Chinese-language ability to lead the development of a search engine designed for Chinese people. The answer, it’s clear to me, is in Scott’s smile, which reflects perfectly one of the core philosophies of his multinational employer: a sincere desire to understand and serve, through innovation, the local community.
Microsoft, a rookie in the bewildering market of online dictionaries, has chosen an entirely different approach than Google, Netease Youdao, and iPowerWord to helping Chinese people write well in English.
Engkoo is essentially a vertical-search engine that helps Chinese people learn English. Similar to a number of other achievements at Microsoft Research Asia, Engkoo was born amid millions of risk-taking attempts by researchers. “It was not meant for commercial use at first,” Scott says, “but instead was intended to be a popular and useful gadget within Microsoft Research Asia.”
Engkoo is a product of the collective wisdom of, among other groups, the Innovation Engineering Group, the Speech Group, the User Interface Group and the Machine Learning Group. Researchers of Microsoft Research Asia were its first users—and its first critics.
Bringing together more than a dozen authorized editions of professional dictionaries, including the Microsoft Office Dictionary and the electronic edition of the encyclopedia, Engkoo analyzes pages from across the Internet and then extracts and refines an increasing number of bilingual sentences and phrases. It then performs automatic classification, quality assessment, relevance sorting, and syntax analysis on these bilingual materials, using technologies such as natural-language computing and statistical machine translation to build high-dimensional indexing based on linguistic characteristics. This result is a completely innovative user experience that goes beyond a keyword search.
Thanks to numerous improvements, Engkoo withstood “fault-finding” attempts by some of “the most intelligent” people in the Sigma Building. Engkoo was an eye-catcher when it made its debut during Microsoft Research Asia’s “Innovation Day” last November, alongside more than 40 other innovations.
“The search functions of other online dictionaries could be perfect, but Engkoo goes the extra mile for its users,” said Weikun Wang, a student of the Beijing Institute of Technology’s Microsoft Technology Club, after watching a presentation of the technology. “It not only offers a complete thesaurus, but it also puts a higher priority on user experience. I believe Engkoo will become one of the most popular online dictionaries in the future.”
The competitiveness of a software product often relies on how user-friendly its interface is and how well it handles details. In this sense, Engkoo might be considered a breakthrough when compared with most online dictionaries, which remain more or less translation assistants.
“When writing, you often need to find one word to match another to make a sentence read better. Engkoo’s Part-of-Speech Matching function can definitely help you out here,” Liu explains. “Just enter the trunk of a sentence, with undetermined words replaced by initials of their respective parts of speech. Engkoo will automatically search for sample sentences that meet such requirements. It has been my savior when working on a technical report in English.”
Enter a word in Engkoo’s search box, and it will display all relevant information on a single page. At the top of the page, you’ll find commonly used meanings, along with the word’s pronunciation, part of speech, Chinese/English translation, and different forms. These word explanations are based on authoritative, 10-million-entry dictionaries and the latest Internet search results, which means, Scott says, they are constantly being updated and improved.
The section next to the keyword’s explanation displays sample English and Chinese sentences found online that contain the keyword. These samples are selected from vast amounts of Internet data through complex machine-language analysis and algorithms that filter out sentences that have spelling or grammar mistakes, are particularly long, contain unintelligible symbols, or come with lousy Chinese translations. “The search engine is now able to present 10 samples sorted in descending order of language quality,” says Scott, “and more and more samples will be given going forward, based on Engkoo’s analysis of the user’s choices.”
Engkoo's sample-sentence-search function goes even further. It analyzes found sample sentences and then presents them to users in categories: spoken language, written language, or technical language. Users are then able to sort the sample sentences based on their level of difficulty. So whether you are a primary-school student doing English homework, a professional researcher composing an academic paper, or a business executive compiling a report for your boss, there will always be sample sentences that fit your style and level of sophistication.
As for entries with similar meanings, you can drag and drop them onto one page for comparison—not only in their original forms, but also in their plural forms and various tenses and degrees of strength, and as different parts of speech. Clicking on any word on the page partially reloads relevant content. The text-to-speech module can read sentences using authentic pronunciation.
"Engkoo is only the starting point,” says Scott. “We expect it to serve as a platform from which we can launch the latest achievements from future research efforts. It isn’t just a search engine.”
By helping Chinese people write proper English, Engkoo is positioned one step further than online dictionaries, which often fall short of a writer’s requirements. Users will be able to turn to Engkoo to verify the results they receive from other translation aids.
A glimmer appeared in Scott’s eyes when he began to talk about Engkoo’s future. “More useful functions will be added,” he says. “For example, the technology might be able to immediately translate a highlighted English sentence into Chinese. Translations of English sentences and paragraphs are all under research.”
Language is the means by which humans communicate, and technology shortens the distance in the exchanges between people. Microsoft is doing its part to bridge these two critical areas of human development.
From the beginning, Microsoft Research Asia’s product development has been guided by two principles: basing products on Chinese culture and striving to satisfy the needs of Chinese users. From Renlifang, to the Microsoft Couplet System, to today's Engkoo, Microsoft Research Asia is poised to take vertical search by storm.