Database Privacy
Database Privacy

Research related to privacy issues in data analysis.

Overview

The problem of statistical disclosure control—revealing accurate statistics about a population while preserving the privacy of individuals—has a venerable history. An extensive literature spans multiple disciplines: statistics, theoretical computer science, security, and databases.  Nevertheless, despite this extensive literature, «privacy breaches» are common, both in the literature and in practice, even when security and data integrity are not compromised.

This project revisits private data analysis from the perspective of modern cryptography.  We address many previous difficulties by obtaining a strong, yet realizable, definition of privacy. Intuitively, differential privacy ensures that the system behaves the essentially same way, independent of whether any individual, or small group of individuals, opts in to or opts out of the database.  More precisely, for every possible output of the system, the probability of this output is almost unchanged by the addition or removal of any individual, where the probabilities are taken over the coin flips of the mechanism (and not the data set). Moreover, this holds even in the face of arbitrary existing or future knowledge available to a «privacy adversary,» completely solving the problem of database linkage attacks.

Databases can serve many social goals, such as fair allocation of resources, and identifying genetic markers for disease.  Better participation means better information, and the «in vs out» aspect of differential privacy encourages participation.

For a general overview of differential privacy—the problems to be solved, the defintion, the formal impossibility results that lead to the definition, general techniques for achieving differential privacy, and some recent directions, see «A firm foundation for private data analysis» (to appear in Communications of ACM). 

For selected publications organized by topic and chronological ordered scroll down or follow the links:

A comprehensive list of papers related to the project appears here:

 

Postdoctoral position announcement

 

An inter-disciplinary group of researchers based at Stanford University, UC Berkeley, and Microsoft Research, Silicon Valley, invites applications for two-year postdoctoral fellowships under the Sloan foundation-sponsored project "Towards Practicing Privacy”. The project involves the study, development and implementation of mathematically rigorous privacy protection technology that can be used to facilitate the data access for economists and medical researchers, and accommodate statistical data analyses performed by empirical researchers in these fields. The ultimate goal of the project is to understand and, it is hoped, to overcome the obstacles to the practical implementation of differentially private data analysis of large datasets used by social scientists and medical researchers.

 

The project is led by Dr. Cynthia Dwork, distinguished scientist at Microsoft Research, Prof. John Mitchell, professor of Computer Science at Stanford University, and Prof. Denis Nekipelov, assistant professor of Economics at UC Berkeley. The advisory members of the project include Prof. Steven Brenner, professor of Plant and Microbial Biology at UC Berkeley and Prof. David Card, professor of Economics at UC Berkeley.

A broad range of applicants will be considered: computer scientists with expertise in private data analysis or privacy-preserving systems, researchers in theoretical and applied econometrics, as well as applied data-driven fields including industrial organization and labor economics, and researchers in bioinformatics. Salary will be competitive with academic postdoctoral salaries in Computer Science, commensurate with experience.

 

Completed applications received by April 1, 2012, will receive full consideration. Applications received after the deadline will be considered until the position is filled. The application should include the following: Cover letter, CV, up to three sample research papers, and three letters of recommendation. Materials should be sent to lynda@cs.stanford.edu.

 
Selected Surveys and Invited Talks
Definitions
  • Cynthia Dwork, Differential Privacy, in 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP 2006), Springer Verlag, Venice, Italy, July 2006
  • Ilya Mironov, Omkant Pandey, Omer Reingold, and Salil Vadhan, Computational Differential Privacy, in Advances in Cryptology—CRYPTO 2009, Springer, August 2009
Mechanisms: Foundations
Mechanisms: Other Methods
Programming
  • Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim, Practical Privacy: The SuLQ Framework, in 24th ACM SIGMOD International Conference on Management of Data / Principles of Database Systems, Baltimore (PODS 2005), Baltimore, Maryland, USA, June 2005
  • Frank McSherry, Privacy Integrated Queries, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD), Association for Computing Machinery, Inc., June 2009
Lower Bounds and Attacks
Applications
Pan-privacy and Continual Observations
  • Cynthia Dwork, Moni Naor, Toniann Pitassi, Guy N. Rothblum, and Sergey Yekhanin, Pan-Private Streaming Algorithms, in Proceedings of The First Symposium on Innovations in Computer Science (ICS 2010), Tsinghua University Press, January 2010
  • Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum, Differential Privacy Under Continual Observation, in STOC '10: Proceedings of the 42nd ACM symposium on Theory of computing, Association for Computing Machinery, Inc., June 2010
Public Policy

Events

        In the media

        Related Project: PINQ
        • Privacy Integrated Queries (PINQ)
          Privacy Integrated Queries is a LINQ-like API for computing on privacy-sensitive data sets, while providing guarantees of differential privacy for the underlying records. The research project is aimed at producing a simple, yet expressive language about which differential privacy properties can be efficiently reasoned and in which a rich collection of analyses can be programmed.