Research related to privacy issues in data analysis.
Overview
Statistical databases such as those produced by the US Census contain a large volume of illuminating and potentially useful data. They also run the risk of revealing a great deal of specific information about the participants, which participants generally dislike.
Additionally, each individual exists in a myriad of databases around the world, from purchases at online booksellers to medical records at hospitals to records on file with the government. Such databases individually may reveal little about an individual, though when combined they may be quite incriminating.
There is an inherent tradeoff between the utility that databases can offer and the privacy they afford their constituents. We are studying this tradeoff formally, attempting to understand the relationship between privacy and utility, and thereby find a comfortable position between the extremes of fully disclosed and completely withheld data.
- What is the right formal characterization of privacy in a public database?
- What sanitization measures should be employed to preserve this privacy?
- Which data analyses can be performed on this sanitized data?
- Committee on Technical and Privacy Dimensions of Information for Terrorism Prevention and Other National Goals and National Research Council, Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment, National Academies Press, 26 September 2008
- Cynthia Dwork, The Differential Privacy Frontier, in 6th Theory of Cryptography Conference, TCC 2009, Springer Verlag, San Francisco, CA, March 2009
- Cynthia Dwork, Differential Privacy: A Survey of Results, in Theory and Applications of Models of Computation—TAMC, Springer, April 2008
- Cynthia Dwork, An Ad Omnia Approach to Defining and Achieving Private Data Analysis, in Privacy, Security, and Trust in KDD—PinKDD 2007, Springer Verlag, August 2007
- Cynthia Dwork, Ask a Better Question, Get a Better Answer A New Approach to Private Data Analysis, in 11th International Conference on Database Theory (ICDT 2007), Springer, Barcelona, Spain, January 2007
- Cynthia Dwork, Differential Privacy, in 33rd International Colloquium on Automata, Languages and Programming, part II (ICALP 2006), Springer, Venice, Italy, July 2006
- Cynthia Dwork, Sub-linear Queries Statistical Databases: Privacy with Power, in Cryptographers' Track, RSA Conference (CT-RSA '05), Springer, San Francisco, CA, USA, February 2005
- Ilya Mironov, Omkant Pandey, Omer Reingold, and Salil Vadhan, Computational Differential Privacy, in Advances in Cryptology—CRYPTO 2009, Springer, August 2009
- Frank McSherry and Ilya Mironov, Differentially Private Recommender Systems: Building Privacy into the Netflix Prize Contenders, in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Association for Computing Machinery, Inc., June 2009
- Frank McSherry, Privacy Integrated Queries, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD), Association for Computing Machinery, Inc., June 2009
- Cynthia Dwork and Jing Lei, Differential Privacy and Robust Statistics, in Proceedings of the 41th Annual ACM Symposium on Theory of Computing (STOC), Association for Computing Machinery, Inc., Bethesda, Maryland, May 2009
- Cynthia Dwork, Moni Naor, Omer Reingold, Guy Rothblum, and Salil Vadhan, On the Complexity of Differentially Private Data Release, in Proceedings of the 41th Annual ACM Symposium on Theory of Computing (STOC), Association for Computing Machinery, Inc., Bethesda, Maryland, May 2009
- Cynthia Dwork and Sergey Yekhanin, New Efficient Attacks on Statistical Disclosure Control Mechanisms, in Advances in Cryptology—CRYPTO 2008, Springer, August 2008
- Frank McSherry and Kunal Talwar, Mechanism Design via Differential Privacy, in Annual IEEE Symposium on Foundations of Computer Science (FOCS), IEEE, Providence, RI, October 2007
- Boaz Barak, Kamalika Chaudhuri, Cynthia Dwork, Satyen Kale, Frank McSherry, and Kunal Talwar, Privacy, accuracy, and consistency too: a holistic solution to contingency table release, in Proceedings of the Twenty-Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Association for Computing Machinery, Inc., Beijing, China, June 2007
- Cynthia Dwork, Frank McSherry, and Kunal Talwar, The price of privacy and the limits of LP decoding, in Proceedings of the 39th Annual ACM Symposium on Theory of Computing (STOC), Association for Computing Machinery, Inc., San Diego, California, USA, June 2007
- Lars Backstrom, Cynthia Dwork, and Jon M. Kleinberg, Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography, in International conference on World Wide Web (WWW), ACM, Banff, Alberta, Canada, May 2007
- Philippe Golle, Frank McSherry, and Ilya Mironov, Data Collection With Self-Enforcing Privacy, in ACM Conference on Computer and Communications Security (CCS 2006), ACM, October 2006
- Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor, Our Data, Ourselves: Privacy Via Distributed Noise Generation, in Advances in Cryptology (EUROCRYPT 2006), Springer Verlag, Saint Petersburg, Russia, May 2006
- Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith, Calibrating Noise to Sensitivity in Private Data Analysis, in Third Theory of Cryptography Conference (TCC 2006), Springer, New York, NY, USA, March 2006
- Shuchi Chawla, Cynthia Dwork, Frank McSherry, and Kunal Talwar, On Privacy-Preserving Histograms, in Uncertainty in Artificial Intelligence (UAI), Association for Uncertainty in Artificial Intelligence, Edinburgh, Scotland, July 2005
- Avrim Blum, Cynthia Dwork, Frank McSherry, and Kobbi Nissim, Practical Privacy: The SuLQ Framework, in 24th ACM SIGMOD International Conference on Management of Data / Principles of Database Systems, Baltimore (PODS 2005), Baltimore, Maryland, USA, June 2005
- Shuchi Chawla, Cynthia Dwork, Frank McSherry, Adam Smith, and Hoeteck Wee, Toward Privacy in Public Databases, in Second Theory of Cryptography Conference, (TCC 2005), Springer Verlag, Cambridge, MA, USA, February 2005
- Cynthia Dwork and Kobbi Nissim, Privacy-Preserving Datamining on Vertically Partitioned Databases, in 24th Annual International Cryptology Conference (CRYPTO 2004), Springer Verlag, Santa Barbara, California, USA, August 2004
Events
- Upcoming: Statistical and Learning-Theoretic Challenges in Data Privacy, February 22–26, 2010. Institute for Pure and Applied Mathematics (IPAM), Los Angeles, CA
- MindSwap on Privacy Technology, October 19–20, 2007. Center for Computational Thinking, Carnegie Mellon, Pittsburgh, PA
- Workshop on Data Confidentiality, September 6–7, 2007, Arlington, VA
- CS-Statistics Workshop On Privacy and Confidentiality, July 9–15, 2005, Bertinoro, Italy
- DIMACS/PORTIA Workshop on Privacy-Preserving Data Mining, March 15–16, 2004, DIMACS Center, Rutgers University, Piscataway, NJ
In the media
- Thomas Claburn, Counterterrorist Data Mining Needs Privacy Protection, InformationWeek, Oct. 7, 2008
- Samuel Greengard, Privacy Matters, Communications of the ACM, vol. 51(9), Sep. 2008
- Dean Takahashi, Let's not give up on our privacy, San Jose Mercury News, Dec. 21, 2006
- Privacy Integrated Queries (PINQ)Privacy Integrated Queries is a LINQ-like API for computing on privacy-sensitive data sets, while providing guarantees of differential privacy for the underlying records. The research project is aimed at producing a simple, yet expressive language about which differential privacy properties can be efficiently reason and in which a rich collection of analyses can be programmed.



