The field of computer networking research has traditionally been challenged by the tension between relying on actual network traces for evaluation, and the privacy and security issues associated with releasing actual traces to researchers. While the traces are critical for conducing and assessing research on computer networks, the also contain a substantial amount of information about the participants in the trace. This information can be explicit (the contents of packets), or implicit (two known computers communicated with each other). The challenge of conducting research while preventing access to such sensitive information is important for the future of this, and other, research fields.
Our observation, though we are hardly the first, is that much of the information required by analysts is ''aggregate"; it concerns large-scale statistics about the trace as a whole, rather than specific properties of individual packets. While this intuition is appealing, formal articulations of the distinction have only recently begun to emerge, one appealing example of which is Differential Privacy. Differential privacy formally requires that a computation depend only slightly on any one individual input record, while permitting substantial variation as a result of changes in large numbers of input records.
While differential privacy is appealing, confirming that a computation provides differential privacy is challenging, most commonly done by hand, by experts, for small programs. Most networking trace analyses are not so simple. As part of our SIGCOMM 2010 paper titled "Differentially-Private Network Trace Analysis", we produced a toolkit and collection of analyses useful for network trace analysis. The tools were built on top of the Privacy Integrated Queries platform for differentially-private data analysis, and were selected to reflect several important types of network trace analyses.
Note: the toolkit contains an updated version of PINQ which is not officially released. The official release of PINQ should be refreshed soon, but until then the version in the toolkit should be used.
- Frank McSherry and Ratul Mahajan, Differentially-Private Network Trace Analysis, in Proceedings of SIGCOMM 2010, Association for Computing Machinery, Inc., 30 August 2010.