Links: | Projects | Introduction | Publications |
Overview:

We explore host network-level properties, in particular, host IP address properties, to derive the rich contexts of a communication between a client and a service. Examples of such properties include whether the host is set up with a dynamically allocated IP address, whether the host is from a large proxy server with many users behind it, or whether the host has been associated with already identified malicious activites. Such information can be used to improve service security and to help service providers better understand user requirements. Finally, we derive these properties automatically from large service logs.
Projects:
-
Travel IP identification
-
Network-level clustering for spam detection
-
Exploring ISP port-blocking policies for detecting triangular spamming
Introduction:
UDMap: Usage-based Dynamic IP-address Map
We developed a novel method, called UDmap, to identify dynamically assigned IP addresses and analyze their dynamics pattern. UDmap is fully automatic, and relies only on application-level server logs that are already available today.
We applied UDmap to a month-long Hotmail user-login trace and identified a large number of dynamic IP addresses -- more than 102 million.By correlating the inferred dynamic IP addresses with Hotmail’s email server log pertaining to three consecutive months, we were able to establish that 97% of mail servers setup on dynamic IPs sent out solely spam emails, likely controlled by zombies. Moreover, these mail servers sent out a large amount of spam -- counting towards over 42% of all spam emails to Hotmail. These results highlight the importance of being able to accurately identify dynamic IP addresses for spam filtering and we suspect of similar benefits for phishing site identification and Botnet detection.

Top 10 ASes with most number of dynamic IP addresses
HostTracker: De-anonymizing the Internet Using Unreliable IDs
Today’s Internet is open and anonymous. While it permits free traffic from any host, attackers that generate malicious traffic cannot typically be held accountable. We develop a system called HostTracker that tracks dynamic bindings between hosts and IP addresses by leveraging application-level data with unreliable IDs. Using a month-long user login trace from a large email provider, we show that HostTracker can attribute most of the activities reliably to the responsible hosts, despite the existence of dynamic IP addresses, proxies, and NATs. With this information, we are able to analyze the host population, to conduct forensic analysis, and also to blacklist malicious hosts dynamically.
Automatically inferring host-IP bindings
AutoRE: Signature-based Spamming Botnet Detection
We developed AutoRE, a spam signature generation framework that detects and characterizes spamming botnets by leveraging both spam payload and spam server traffic properties. AutoRE does not require pre-classified training data or white lists. Moreover, it outputs high quality regular expression signatures that can detect botnet spam with a low false positive rate.
Our in-depth analysis of the identified botnets revealed several interesting findings regarding the degree of email obfuscation, properties of botnet IP addresses, sending patterns, and their correlation with network scanning traffic. We believe these observations are useful information in the design of botnet detection schemes.

Distribution of the botnet hosts around the globe
BotGraph: Large-scale Spamming Botnet Detection
Network security applications often require analyzing huge volumes of data to identify abnormal patterns or activities. The emergence of cloud-computing models opens up new opportunities to address this challenge by leveraging the power of parallel computing.
We design and implement a novel system, called BotGraph, to detect a new type of botnet spamming attacks targeting major Web email providers. BotGraph uncovers the correlations among botnet activities by constructing large user-user graphs and looking for tightly connected subgraph components. This enables us to identify stealthy botnet users that are hard to detect when viewed in isolation. To deal with the huge data volume, we implement BotGraph as a distributed application on a computer cluster. We believe both our graph-based approach and our implementations are generally applicable to a wide class of security applications for analyzing large datasets.

An example login graph
- Andreas Pitsillidis, Yinglian Xie, Fang Yu, Martin Abadi, Geofferey M. Voelker, and Stefan Savage, How to Tell an Airport from a Home: Techniques and Applications, in HotNets 2010, Association for Computing Machinery, Inc., October 2010
- Zhiyun Qian, Zhuoqing Morley Mao, Yinglian Xie, and Fang Yu, Investigation of Triangular Spamming: a Stealthy and Efficient Spamming Technique , in IEEE Symposium on Security and Privacy (Oakland) 2010, May 2010
- Fang Yu, Yinglian Xie, and Qifa Ke, SBotMiner: Large Scale Search Bot Detection, in ACM International Conference on Web Search and Data Mining (WSDM), February 2010
- Zhiyun Qian, Zhuoqing Mao, Yinglian Xie, and Fang Yu, On Network-level Clusters for Spam Detection, in The 17th Annual Network and Distributed System Security Symposium (NDSS) 2010, February 2010
- Yinglian Xie, Fang Yu, and Martin Abadi, De-anonymizing the Internet Using Unreliable IDs, in ACM SIGCOMM, August 2009
- Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum, BotGraph: Large Scale Spamming Botnet Detection, in The 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI '09), USENIX, April 2009
- Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov, Spamming Botnet: Signatures and Characteristics, in ACM SIGCOMM 2008, Seattle, WA, August 2008
- Yinglian Xie, Fang Yu, Kannan Achan, Eliot Gillum, Moisés Goldszmidt, and Ted Wobber, How Dynamic are IP Addresses, in Proceedings of the ACM SIGCOMM Conference, Association for Computing Machinery, Inc., Kyoto, Japan, August 2007
