Previous Research


I am in interested in large scale networking systems. I have focused this interest in the realm of making the Internet more secure, in particular, by exploiting new hardware capabilities and developing new algorithms for better classification of network traffic. This has many applications, but a key one is detecting suspect traffic in order to filter out worms and viruses as well as other forms of bad traffic. In my dissertation, I have designed efficient packet classification and scanning algorithms that have significant improvements over current solutions.


PhD Dissertation Summary

In today’s edge networks, many new services are emerging such as intrusion detection, high speed firewalls, NAT, HTTP load balancing, XML processing, TCP offloading, encryption/decryption, etc. Traditional packet handling techniques, such as next hop forwarding can’t support these new services. Instead, we need deep packet inspection techniques that can function on the entire packet contents rather than just the header. For example, Network Intrusion Detection Systems (NIDS) inspect the packet payload at line rates to detect and filter packets containing worm signatures. These signature sets are large and complex, which make software-only implementations unable to meet typical performance goals. Therefore, we need more efficient algorithms and hardware solutions to achieve desirable performance.

I propose deep packet inspection schemes using emerging hardware technologies. First, I propose a multi-match classification solution which intelligently process packet header information using a special memory -- Ternary Content Addressable Memories (TCAMs) due to their ability to perform fast parallel matching. Second, I develop scheme to identify within packet payload also using TCAM. Lastly, I implement an efficient regular expression rewriting scheme and a fast searching engine for multi-core processor environments. With all these techniques, we can perform multi-match packet classification based on multiple fields in the packet header and perform gigabit rate packet payload scanning against thousands of complex patterns. The proposed schemes can efficiently handle long patterns, correlated patterns, patterns with negation, and also general regular expressions [thesis].


Multi-match packet classification

I adopt Ternary Content Addressable Memories (TCAMs) to solve the multi-match classification problem due to their ability to perform fast parallel matching. However, TCAM is expensive and consumes large amounts of power. None of the previously published multi-match classification schemes is both memory and power efficient. Hence, I developed a novel scheme that meets both requirements by using a new Set Splitting Algorithm (SSA). The main idea of SSA is that it splits filters into multiple groups and performs separate TCAM lookups into these groups. It guarantees the removal of at least half the intersections when a filter set is split into two sets, thus resulting in low TCAM memory usage. SSA also accesses filters in the TCAM only once per packet, leading to low power consumption. Compared to previously published schemes, SSA yields a 75% to 95% reduction in power consumption. [paper1] [slides], [paper2] [slides]


Deep packet pattern matching with TCAM

Because of their intrinsic parallel search capability, TCAMs can also be used effectively for the pattern matching functions needed in intrusion detection systems. However, TCAMs impose limitations on the pattern length that can be directly matched. Also, there is no direct method to handle correlated patterns such as the example patterns shown in Figure 1. In this paper, we develop algorithms that use TCAMs to achieve high speeds while not being restricted to these limitations. Our work is applicable to other layer 7 pattern matching problems. For example, applications like HTTP load balancing, email SPAM filtering, etc., require packet payload scanning. In this paper, we develop and test algorithms for intrusion detection and anti-virus signature sets because they are more complex than the signatures of the applications listed above. This work won a best paper award at the International Conference on Network Protocols (ICNP). [paper] [slides]


Fast Regular Expression Matching

Regular expressions are replacing explicit string patterns as the pattern matching language of choice in the above networking applications. For example, in the Linux Application Protocol Classifier (L7-filter), all protocol identifiers are expressed as regular expressions. Memory requirements using traditional methods for fast regular expression scanning are prohibitively high for many patterns used in networking applications. Hence, I propose regular expression rewrite techniques that reduce memory usage. Further, I develop a scheme based on compiling regular expressions into several engines, which dramatically increases the regular expression matching speed without significantly increasing memory usage. We implement the DFA-based packet scanners. Our experiment results using real-world traffic and patterns have shown that our implementation achieves one to three orders of magnitude of speedup compared to the best NFA-based implementation. [paper] [slides], [related paper]


Research on Optical Networks

In mesh restoration in optical network, the reasons for blocking of restoration path can either be different paths competing for the same channel, which we call glare, or there is insufficient bandwidth available. We proposed two mechanisms to reduce these two sources of blocking. First, channel selection methods that considerably reduce glare, while at the same time maintaining large groups of contiguous channels (for high speed connections) across multiple, parallel links. Then a hybrid solution that utilizes a centralized restoration “path server” to optimize the restoration path selection, yet utilizes distributed control to compute service paths and set up service/restoration paths. The objective is to achieve very fast provisioning, fast restoration upon network failure, and efficient use of capacity. To support this, we also present an efficient restoration path selection algorithm.

  • Rakesh Sinha, Fang Yu, and Bob Doverspike , “Performance Study of Mesh Restoration,” Optical Fiber Communication. Conference, March 2005.
  • Fang Yu, Rakesh Sinha, Dongmei Wang, Guangzhi Li, John Strand, Robert Doverspike, Charles Kalmanek, and Bruce Cortez, “Improving Restoration Success in Mesh Optical Networks,” Journal of Optical Networks, Dec 2003.
  • Fang Yu, Rakesh K. Sinha, Robert Doverspike, Bruce Cortez, John Strand, “Link Selection Schemes for Avoiding Channel Contention,” Fourth International Workshop on the Design of Reliable Communication Networks (DRCN), Oct 2003.
  • Fang Yu, Dongmei Wang, Rakesh Sinha, Guangzhi Li, Bob Doverspike and Chuck Kalmanek, “Hybrid Centralized/Distributed Approach to Optical Network Restoration,” Optical Fiber Communication. Conference, March 2003.

Various Previous Research

  • Sylvia Ratnasamy, Brad Karp, Scott Shenker, Deborah Estrin, Ramesh Govindan, Li Yin, and Fang Yu, “Data-centric storage in sensornets with GHT, a geographic hash table,” ACM Mobile Networks and Applications(MONET), Volume 8 Issue 4, August 2003.
  • Sylvia Ratnasamy, Brad Karp, Li Yin, Fang Yu, Deborah Estrin, Ramesh Govindan, Scott Shenker, “GHT: A Geographic Hash Table for Data-Centric Storage,” in First ACM International Workshop on Wireless Sensor Networks and Applications (WSNA) 2002.
  • Fang Yu, Qian Zhang, Wenwu Zhu and Ya-Qin Zhang, “Qos adptive Proxy Caching for Multimedia,” IEEE transactions on Circuits and Systems on Video Technology , 2003.
  • Fang Yu, Qian Zhang, Wenwu Zhu and Ya-Qin Zhang, “Network-Adaptive Cache Management Schemes for Mixed Media,” The first IEEE Pacific Rim Conference on Multimedia (IEEEPCM 2001), Nov 2001.
  • Fang Yu, Qian Zhang, Wenwu Zhu and Ya-Qin Zhang, “Qos-Adpative Proxy Caching for Multimedia Streaming Over Internet,” The first IEEE Pacific Rim Conference on Multimedia (IEEEPCM 2000), Dec 2000.
  • Fang Yu and Wen Jin, “An Effective Approach to Mining Exception Class Association Rules (ECAR),” International Conference on Web-Age Information Management (in Cooperation with ACM SIGMOD) 2000, June 2000.
  • Shuigeng Zhou, Ye Fan, Jiangtao Hu, Fang Yu, and Yunfa Hu, “Hierarchically Classifying Chinese Web Documents Without Dictionary Support And Segmentation Procedure,” International Conference on Web-Age Information Management (in Cooperation with ACM SIGMOD) 2000, June 2000.