Benjamin I. P. Rubinstein, Peter L. Bartlett, Ling Huang, and Nina Taft
The ubiquitous need for analyzing privacy-sensitive information---including health records, personal communications, product ratings and social network data - is driving significant interest in privacy-preserving data analysis across several research communities. This paper explores the release of Support Vector Machine (SVM) classifiers while preserving the privacy of training data. The SVM is a popular machine learning method that maps data to a high-dimensional feature space before learning a linear decision boundary. We present efficient mechanisms for finite-dimensional feature mappings and for (potentially infinite-dimensional) mappings with translation-invariant kernels. In the latter case, our mechanism borrows a technique from large-scale learning to learn in a finite-dimensional feature space whose inner-product uniformly approximates the desired feature space inner-product (the desired kernel) with high probability. Differential privacy is established using algorithmic stability, a property used in learning theory to bound generalization error. Utility - when the private classifier is pointwise close to the non-private classifier with high probability - is proven using smoothness of regularized empirical risk minimization with respect to small perturbations to the feature mapping. Finally we conclude with lower bounds on the differential privacy of any mechanism approximating the SVM.
2009 preprint available at http://arxiv.org/abs/0911.5708
In Journal of Privacy and Confidentiality