Share this page
  • Share this page on Twitter Share this page on Facebook Share this page on Digg Share this page on Del.icio.us Read the Inside Microsoft Research blog
  • E-mail this page Print this page
  • RSS feeds
Home > Projects > Database Privacy
Database Privacy
Database Privacy

Research related to privacy issues in data analysis.

Overview

The problem of statistical disclosure control—revealing accurate statistics about a population while preserving the privacy of individuals—has a venerable history. An extensive literature spans multiple disciplines: statistics, theoretical computer science, security, and databases.  Nevertheless, despite this extensive literature, «privacy breaches» are common, both in the literature and in practice, even when security and data integrity are not compromised.

This project revisits private data analysis from the perspective of modern cryptography.  We address many previous difficulties by obtaining a strong, yet realizable, definition of privacy. Intuitively, differential privacy ensures that the system behaves the essentially same way, independent of whether any individual, or small group of individuals, opts in to or opts out of the database.  More precisely, for every possible output of the system, the probability of this output is almost unchanged by the addition or removal of any individual, where the probabilities are taken over the coin flips of the mechanism (and not the data set). Moreover, this holds even in the face of arbitrary existing or future knowledge available to a «privacy adversary,» completely solving the problem of database linkage attacks.

Databases can serve many social goals, such as fair allocation of resources, and identifying genetic markers for disease.  Better participation means better information, and the «in vs out» aspect of differential privacy encourages participation.

For a general overview of differential privacy—the problems to be solved, the defintion, the formal impossibility results that lead to the definition, general techniques for achieving differential privacy, and some recent directions, see «A firm foundation for private data analysis» (to appear in Communications of ACM). 

For selected publications organized by topic and chronological ordered scroll down or follow the links:

A comprehensive list of papers related to the project appears here:

Selected Surveys and Invited Talks
Definitions
  • Cynthia Dwork, Differential Privacy, in 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP 2006), Springer Verlag, Venice, Italy, July 2006
  • Ilya Mironov, Omkant Pandey, Omer Reingold, and Salil Vadhan, Computational Differential Privacy, in Advances in Cryptology—CRYPTO 2009, Springer, August 2009
Mechanisms: Foundations
Mechanisms: Other Methods
Programming
  • Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim, Practical Privacy: The SuLQ Framework, in 24th ACM SIGMOD International Conference on Management of Data / Principles of Database Systems, Baltimore (PODS 2005), Baltimore, Maryland, USA, June 2005
  • Frank McSherry, Privacy Integrated Queries, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD), Association for Computing Machinery, Inc., June 2009
Lower Bounds and Attacks
Applications
Pan-privacy and Continual Observations
  • Cynthia Dwork, Moni Naor, Toniann Pitassi, Guy N. Rothblum, and Sergey Yekhanin, Pan-Private Streaming Algorithms, in Proceedings of The First Symposium on Innovations in Computer Science (ICS 2010), Tsinghua University Press, January 2010
  • Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum, Differential Privacy Under Continual Observation, in STOC '10: Proceedings of the 42nd ACM symposium on Theory of computing, Association for Computing Machinery, Inc., June 2010
Public Policy

Events

        In the media

        Related Project: PINQ
        • Privacy Integrated Queries (PINQ)
          Privacy Integrated Queries is a LINQ-like API for computing on privacy-sensitive data sets, while providing guarantees of differential privacy for the underlying records. The research project is aimed at producing a simple, yet expressive language about which differential privacy properties can be efficiently reasoned and in which a rich collection of analyses can be programmed.