|
|
Database Privacy
Statistical databases such as are produced by the US Census contain a large volume of illuminating and potentially useful data. They also run the risk of revealing a great deal of specific information about the participants, which participants generally dislike.
Additionally, each individual exists in a myriad of databases around the world, from purchases at online booksellers to medical records at hospitals to records on file with the government. Such databases individually may reveal little about an individual, though when combined they may be quite incriminating.
There is an inherent tradeoff between the utility that databases can offer and the privacy they afford their constituents. We are studying this tradeoff formally, attempting to understand the relationship between privacy and ulitily, and thereby find a comfortable position between the extremes of fully disclosed and completely withheld data.
- What is the right formal characterization of privacy in a public database?
- What sanitization measures should be employed to preserve this privacy?
- Which data analyses can be performed on this sanitized data?
We are exploring two different computational models for statistical databases. In the "census" model, the data are sanitized and the results are published; the adversary has arbitrary access to the published data. In the "output perturbation" model, the adversary may make a limited number of queries to the database. In response, the true answer to each query is computed, and then perturbed, by adding random noise. Only the perturbed value is released. So far, the latter model has proved more tractable, as the adversary's access to data is limited. In this model, provided the adversary is restricted to a number of queries that is sublinear in the number of database rows -- a reasonable assumption if the database is very large, privacy can be achieved at virtually no loss in statistical accuracy. The census model, on the other hand, seems to capture more profound questions.
- Boaz Barak
- Avrim Blum
- Kamalika Chaudhuri
- Shuchi Chawla
- Petros Drineas
- Satyen Kale
- Krishnaram Kenthapadi
- Moni Naor
- Kobbi Nissim
- Adam Smith
- Madhu Sudan
- Hoeteck Wee
- Mechanism Design via Differential Privacy, Frank McSherry, Kunal Talwar,
48th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2007, pp. 94–103, 2007.
- The Price of Privacy and the Limits of LP Decoding, Cynthia Dwork, Frank McSherry, Kunal Talwar,
39th Annual ACM Symposium on Theory of Computing—STOC 2007, pp. 85–94, 2007.
- Privacy, Accuracy, and Consistency Too: A Holistic Solution to Contingency Table Release, Boaz Barak, Kamalika Chaudhuri, Cynthia Dwork, Satyen Kale, Frank McSherry, Kunal Talwar,
26th Symposium on Principles of Database Systems—PODS 2007, pp. 273–282, 2007.
- Wherefore Art Thou r3579x? Anonymized Social Networks, Hidden Patterns, and Structural Steganography, Lars Backstrom, Cynthia Dwork, Jon M. Kleinberg,
16th International Conference on World Wide Web—WWW 2007, pp. 181–190, 2007. Best paper award.
- [Invited paper] Ask a Better Question, Get a Better Answer: A New Approach to Private Data Analysis, Cynthia Dwork,
Database Theory—ICDT 2007, pp. 18–27, 2007.
- Data Collection With Self-Enforcing Privacy, Philippe Golle, Frank McSherry, Ilya Mironov,
ACM Conference on Computer and Communications Security—ACM CCS 2006, pp. 69–78, 2006. Full version.
- [Invited paper] Differential Privacy, Cynthia Dwork,
33rd International Colloquium on Automata, Languages and Programming—ICALP 2006, Part II, pp. 1–12, 2006.
- Our Data, Ourselves: Privacy via Distributed Noise Generation, Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, Moni Naor,
Advances in Cryptology—Eurocrypt 2006, pp. 486–503, 2006.
- Calibrating Noise to Sensitivity in Private Data Analysis, Cynthia Dwork, Frank McSherry, Kobbi Nissim, Adam Smith,
3rd Theory of Cryptography Conference—TCC 2006, pp. 265–284, 2006.
- [Invited paper] Sub-linear Queries Statistical Databases: Privacy with Power, Cynthia Dwork,
Topics in Cryptology—CT-RSA 2005, pp. 1–6, 2005.
- Toward Privacy in Public Databases, Shuchi Chawla, Cynthia Dwork, Frank McSherry, Adam Smith, Hoeteck Wee,
2nd Theory of Cryptography Conference—TCC 2005, pp. 363–385, 2005.
- On the Utility of Privacy-Preserving Histograms, Shuchi Chawla, Cynthia Dwork, Frank McSherry, Kunal Talwar,
21st Conference on Uncertainty in Artificial Intelligence—UAI 2005.
- Practical Privacy: The SuLQ Framework, Avrim Blum, Cynthia Dwork, Frank McSherry, Kobbi Nissim,
24th Symposium on Principles of Database Systems—PODS 2005, pp. 128–138, 2005.
- Privacy-Preserving Data Mining in Vertically Partitioned Databases, Cynthia Dwork, Kobbi Nissim,
Advances in Cryptology—CRYPTO 2004, pp. 528–544, 2004.
- MindSwap on Privacy Technology. October 19–20, 2007. Center for Computational Thinking, Carnegie Mellon, Pittsburgh, PA.
- Workshop on Data Confidentiality. September 6–7, 2007. Arlington, VA.
- CS-Statistics Workshop On Privacy and Confidentiality. July 9–15, 2005. Bertinoro, Italy.
- DIMACS/PORTIA Workshop on Privacy-Preserving Data Mining. March 15–16, 2004. DIMACS Center, Rutgers University, Piscataway, NJ.
|