By Suzanne Ross
September 7, 2005 12:00 AM PT
Searching for something on the Web, but can't spell it? Don't worry. It's all been done before.
Researchers at Microsoft have discovered a way to use the spelling expertise of millions to improve your spelling, and therefore, your chances of finding the information you need. Their research revealed that about 80% of the people searching for similar subjects on the Web know how to spell the topic correctly, with only about 10% being way off base.
"People are doing their best to spell things in the right way so they can find information," said Silviu Cucerzan, the lead researcher on the project. "Our idea was that the people who can't spell a term correctly can be helped by the people who are spelling it correctly."
Cucerzan and his research partner, Eric Brill, are excited about the possibilities in using query logs to learn about what people like and what people want. A lot of queries contain implicit information. At a simple level, query logs can tell us the most popular searches — such as, are people searching on Jennifer Lopez more than they�re searching on Jennifer Garner? MSN has even produced a fun Web page to let people see the most popular searches, and to vote on their favs.
But spelling is a bit of a sticky problem. You can't use the traditional spell check methods that are useful in word processing programs. They compare your input with a lexicon of known words and usages. But topics on the Web don�t fall into neat dictionary piles. They're much messier.
For instance, if you type 'swarzenegger' into a word processing program, the program is going to tell you that this isn't a word, never has been a word, that it's so far off from a word that it can't even begin to suggest a different way to spell that combination of letters. But the Web knows better. It knows that the Internet isn't a neat, English only set of facts. It will happily find search results that link to the governor of California. If you spell it correctly.
If you spell your search query incorrectly, most search engines will try to find a close match. Or, the search results might show you sites where others have misspelled it in the same way. This will either 1) send you to a site that has nothing to do with the governor or 2) keep you forever in ignorance of the right way to spell Arnold's last name.
The researchers decided to use an iterative approach to finding the correct spelling for almost any term input into a search engine. It doesn�t even matter what language the search string is, their methods will still find the correct spelling.
The iterative process that Cucerzan and Brill developed matches your misspelled word to a closely related word, and from there to the next best match, and so on until a reasonable search string is found. Here's an example:
Misspelled query:  govnor arnol scwartegger
First iteration:  governor arnold schwartnegger
Second iteration:  governor arnold schwarznegger
Third iteration:  governor arnold schwarzenegger
Fourth iteration:  no further correction needed
Each iteration is based on people's tries at spelling the words in this query correctly, with the last iteration containing the word forms used by the majority of Web searchers.
Because these spell checking methods are based on query sub-strings statistics, they can easily handle misspelled queries such as 'britnet spear inconcert'. The program would break the string into sub-strings of 'britnet spear' and 'inconcert' and then correct the sub-strings to 'britney spears' and 'in concert.'
Cucerzan and Brill's spell checking methods will help you and others find the right information without having to struggle to come up with the proper spelling.