Share on Facebook Tweet on Twitter Share on LinkedIn Share by email
Hot on the Trail of Web Troublemakers
June 15, 2011 12:00 PM PT

By Rob Knies
Microsoft Research

At the Microsoft Innovation & Policy Center in Washington, D.C., today, Microsoft Research is hosting the D.C. TechFair 2011, an event where thought leaders get a chance to see intriguing research projects that point toward a more productive, prosperous tomorrow.

The demos on display run the gamut from healthcare and the environment to natural user interfaces and cloud computing. But certain to catch visitors’ eyes is one called Social Graphs for Online Service Security, presented by Yinglian Xie and Fang Yu of Microsoft Research Silicon Valley.

Both are veterans of years of groundbreaking research designed to thwart the malicious few who regularly imperil the tremendous value the web has delivered to hundreds of millions worldwide. Those bad eggs, though, are encountering intrepid opposition, as a brief chat with Xie and Yu made obvious.

“The problem we are solving,” Xie explains, “is how to differentiate attacker-created email accounts on large-scale online service properties, such as Hotmail, from legitimate user accounts.”

These days, perhaps the most popular tactic to defeat this scourge is the CAPTCHA. You’ve seen them, no doubt—collections of distorted letters relatively simple to decipher for human users but resistant to computer recognition. The bad guys are making progress, though. As Xie notes, attackers have found ways to bypass CAPTCHAs at low cost. What can be done?

“If you look at individual malicious-user-created accounts,” Xie says, “it can be very difficult to tell them from legitimate user accounts. One thing we want to look at is whether we have a way of looking across a large number of users, looking at their connectivity among each other, to be able to differentiate the legitimate user community from the attacker part.

“The intuition here is very simple: If we define connectivity as mutual email exchange, a normal user will talk to other people—send email and receive email. But attackers will mostly send malicious content. They do not receive messages back from legitimate users. Essentially, all the legitimate users are going to be connected in some way into communities. Attackers are more isolated users on the connectivity graph.”

The researchers examine anonymized data in developing a view of the graph.

A small portion of the connectivity graph for an online service
A small portion of a graph of an online service: The well-connected dots represent communities of normal users, while the outlying, minimally connected dots could indicate malicious users.

“We can leverage the total graph structure, where users are positioned on the graph, as a new direction in fighting these large-scale service abusers. We explore graph properties by mining large-scale graphs, looking at previous graph-theory studies, and exploring, in this new context, dynamic graph properties we can explore over time, the attackers’ counter-strategies, and how to be more robust against them.”

Such insight comes naturally for researchers who have been working to block malicious web behavior for almost five years.

“At the beginning,” Yu recalls, “we were only looking at email spam, a rather isolated problem. Later, we expanded to multiple areas. We worked on search-engine spam, ad issues, and online-service properties. We learned that many of these attackers have common behaviors. The work we are presenting here is not only common among online services, but also is common in other areas, as well. There are a lot of existing technologies to analyze using graph properties.

“We also look at the history of how the graph evolves, because you can have malicious attackers try to mimic normal users and try to build a graph of their own. But we find that normal communities take time to grow into healthy communities. If you suddenly have a big community, that can be very suspicious.”

It takes constant vigilance to stay a step ahead of these bad actors.

“In cyberspace, these kinds of attacks are very sophisticated, occur in large scale, and take many different forms,” Xie cautions. “Defending against these highly motivated attackers is a sustained effort. We require continued study of more and more dimensions, more data that we can leverage. But we think, over time, as technology such as cloud computing advances, we will have more power to mine large-scale data. In the graph, we have hundreds of millions of users, and previously, it was difficult to analyze such graphs, but now, with the cloud-computing structure within Microsoft, we have data centers that make it feasible to do these kinds of things.

“For us, it’s an interesting area. We need to leverage up-to-date cloud-computing technology and graph theory to get an upper hand in the arms race.”

But that concerted effort doesn’t mean that the good guys will always be coming from behind.

“We are not focused only on detecting malicious users,” Yu says. “We are also focused on analyzing normal users. Those properties are more stable and more robust. You can see their communities. The normal users don’t evolve rapidly, don’t change rapidly, but attackers’ could. Focusing on normal users help us to better distinguish normal users, rather than chasing the others.”

That approach, the researchers feel confident, eventually will win out.

“It’s our vision,” Xie concludes, “that, eventually, we’re going to have the upper hand.”