File-sharing in the Internet: A characterization of P2P traffic in the backbone

Since the outbreak of peer-to-peer (P2P) networking with Napster during the late ’90s, P2P applications have multiplied, become sophisticated and emerged as a significant fraction of Internet traffic. At first, P2P traffic was easily recognizable since P2P protocols used specific application TCP or UDP port numbers. However, current P2P applications have the ability to use arbitrary ports to “camouflage” their existence. Thus only a portion of P2P traffic is clearly identifiable. As a result, estimates and statistics regarding P2P traffic are unreliable. In this paper we present a characterization of P2P traffic in the Internet. We develop several heuristics that allow us to recognize P2P traffic at nonstandard ports. We find that depending on the protocol and metric used, approximately 30%-70% of traffic related to P2P applications cannot be identified using wellknown ports. In addition we resent several characteristics for various P2P networks, such as eDonkey2000, Fasttrack, Gnutella, BitTorrent, Napster and Direct Connect, as seen in traffic samples from two Tier1 commercial backbones in 2002 and 2003.