Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Making Arabic More Accessible
June 9, 2011 1:00 AM PT

February’s workdays were hardly commonplace for the 40 researchers and engineers who work at the Cairo Microsoft Innovation Center (CMIC). A popular uprising against the Egyptian government had begun in late January, and, a few weeks later, tens of thousands of people were jamming Tahrir Square, just 20 minutes from CMIC’s offices in Cairo’s Basatin district.

“We had to shut down for a week,” says Hussein Salama, director of CMIC, a division of Microsoft Research. “First of all, there was a curfew for long hours. And then the Internet was shut down for a week. Plus, half the lab members were down at Tahrir Square protesting. There was quite a revolutionary spirit in the place.”

Tarek Elabbady and Mona Habib at Tahrir Square
Tarek Elabbady (foreground, left), director of EMEA Microsoft Innovation Labs, and Mona Habib, CMIC senior research program manager, join the throngs at Tahrir Square.

Things are calmer now, as Salama and his fellow CMIC employees wait to see how the vast political changes in Egypt play out. But the Egyptian revolution underscored the power of digital communications, with social media tools such as Facebook and Twitter often cited as driving forces behind the upheaval.

Despite the impact of the digital world in Egypt, though, Arabic remains an underserved language. Although it is the fifth most -spoken language in the world and 100 million Arabic speakers are web-connected, they have access to relatively little digital content. Even with the great demand for the Internet in the Arabic-speaking world, only 1.4 percent of online digital content is in Arabic.

That, Salama says, represents a tremendous opportunity for Microsoft to bring technology to Arabic speakers by giving them new ways to communicate and to become more integrated with today’s wired world. At the same time, he adds, expertise gained from working within the Arabic community creates new skills that his facility can employ using other languages.

CMIC was created in 2006. The center’s focus is on exploring technology that has high potential for transfer to Microsoft products and services, particularly technology that advances technological outreach to Arabic speakers. Today, the facility employs four researchers, with the balance of the staff consisting of software engineers.

“Our initial charter—which we still follow—has us pay particular attention to technology that has an impact in Arabic-speaking countries in the Middle East,” Salama says. “At that same time, we’re a very applied lab, and we’re heavy on the development side, so, in the past five years, we’ve worked on many different ideas. People in our lab come up with ideas all the time, ideas that have a ‘wow’ factor, and we want to get those ideas out to local users.”

One of those ideas now is available as Microsoft Afkar, currently in beta form. “Afkar” is an Arabic word for “ideas,” and the goal of Microsoft Afkar is to give Arabic speakers tools to devise their own new ideas.

Arabic poses particular challenges for translators, Salama says.

“Basically, Arabic does not have any vowels,” he says. “In Arabic, there are these miniature letters that go on top of letters, and, in many cases, we write without these mini-letters in the first place.” Arabic also is read from right to left, rather than left to right, and all Arabic letters have “conditional” forms, based on whether they will be connected to a preceding or following letter.

Microsoft Afkar consists of several tools, all designed to make Arabic more Web-friendly and usable. Its four main components all include the word Maren—an abbreviation for “Microsoft Arabic English:”

  • Microsoft Maren Transliteration: As Salama notes, many Arabic speakers have never learned to type in Arabic, instead learning in English. “I type much faster in English than I do in Arabic, so if I want to chat with someone, I still use the English letters,” he says. Maren Transliteration enables the use of an English keyboard to spell out text in what is called “romanized” Arabic. Maren Transliteration then shows the user options in Arabic script. It’s available in versions for both the web and desktop PCs.
  • Microsoft Maren Autocomplete: Another challenge in Arabic is typing quickly, Salama notes, so CMIC created Maren Autocomplete, a smart tool that automatically completes words typed in Arabic. For each word typed, Maren suggests possible completions, based on letters used and context.
  • Microsoft Maren Multilingual: A third key piece of the Afkar project is Maren Multilingual. “Sometimes, I have a problem remembering the right English word when I am typing,” Salama says. “Let’s say I am trying to write the word ‘election,’ and I don’t remember what it is. I can write that word in Arabic, and it is translated into English so that the final document is fully English.”
  • Microsoft Maren Morph: A plug-in for Internet Explorer, Maren Morph helps Arabic speakers enrich their language experience by enabling them to analyze the morphology (language pattern) of a particular word. “It gives users a dictionary analysis of the root of a word and all the stems,” Salama explains.

Included in the Afkar project are two other useful tools: Bing Answers and Bing Translator. Bing Answers gives users “instant” answers to questions such as Muslim prayer times in specific cities. Bing Translator translates text and web pages for 32 languages, including Arabic.

In addition, the Afkar collection includes WikiBhasha, a content-creation tool for Wikipedia that was developed by Microsoft Research India. With WikiBhasha, users can take Wikipedia content written in English, translate it into a variety of other languages, and add new content or information to the Wikipedia post.

Along with its work in making Arabic more web-friendly and accessible, CMIC is pursuing two other main development areas. In one, CMIC manages Bing for the Arabic-speaking market.

Hussein Salama
Hussein Salama

“Until May of 2010, if you selected Bing for Arab countries, all you got was a grey screen—none of the nice images that Bing displays,” Salama says. “We enabled Bing images for that area. We also were trailing in search quality, and our work over the past year and a half has greatly improved Arabic search in Bing.”

A third goal targets mobile multimedia. Countries such as Egypt have far more mobile-phone users than Internet users, Salama says—60 million mobile-phone users among an estimated population of 80 million, to 15 million Internet users.

“That means,” he says, “if you want to have impact, you can have a much bigger impact with mobile users.”

One CMIC project focusing on mobile technology is a tool for collecting photos from a variety of users and stitching them together. Salama uses the example of a soccer game, where many mobile-phone users may be taking pictures. CMIC has created a demo that shows how those photos might be collected and assembled, in a manner similar to that of Microsoft Photosynth, creating an image that contains many points of view. To achieve that, the CMIC team had to solve problems such as how to correct for instances where photos are tilted a few degrees because the phone was held at an angle.

The work on that project caught the attention of the Windows Phone 7 team. That team asked CMIC to develop a tool to “fix” crooked photos, which will appear in an upcoming Windows Phone update. CMIC contributed photo filters to improve the brightness and sharpness of photos taken with a mobile phone.

Salama credits the successes of his small team to two things: the quality of the team and its location in Cairo.

“Our team consists of leads and researchers who are Egyptian and who have lived and worked abroad for many years before deciding to return to their home country, Egypt,” he says. “A few of them have worked for Microsoft in Redmond. These leaders and researchers are fully committed to CMIC’s success. Also, because CMIC is a relatively young organization, all members of the leadership team share the ownership of the place and the responsibility for shaping its culture.”

Because CMIC is backed by Microsoft and has the feel of a multinational company, it is able to draw on top talent from Egypt.

“We hire the people with the best problem-solving skills and critical-thinking abilities,” Salama says. “These junior software-development engineers and research assistants are very excited about the opportunity to work for Microsoft and deliver top-quality output under the leadership of our Redmond veterans.”

Salama says the political tumult in Egypt has made CMIC’s work even more relevant by creating greater awareness of the need for Arabic to become more widely used on the Internet and in everyday use.

“The amount of Arabic I use on the Internet has tripled since the revolution,” he says. “On Facebook, for example, we communicate much more in Arabic now than we did previously.” Salama says that increased interest in regional news also is encouraging greater use of Arabic on the Internet.

Meanwhile, there is more in the CMIC pipeline: tools and products still under wraps to give Arabic speakers and others new, better ways to communicate.