Privacy, Stability and High-dimensional Sparse Regression

5 Nov
Theory of Computation Seminar
Adam Smith, Penn State and Boston University
Tuesday, November 5, 2013 -
1:30pm to 2:30pm
Pierce Hall 320

I will discuss recent results on how different notions of stability of learning algorithms can be used to design differentially private algorithms. We focus on designing algorithms for statistical modelselection. Given a data set and a discrete collection of models, each of which is a family of probability distributions, the goal is to determine the model that best ``fits'' the data. This is a basic problem in many areas of statistics and machine learning.

Let $f$ be a *nonprivate* model selection procedure. We design corresponding differentially private algorithms that output the correct value $f(\D)$ whenever $f$ satisfies either of two notions ofstability, perturbation stability and subsampling stability.

 We give two classes of results: generic ones, that apply to any function with discrete output set; and specific algorithms for the problem of sparse linear regression. The algorithms we describe areefficient and in some cases match the optimal nonprivate asymptotic sample complexity.

 Our algorithms for sparse linear regression require analyzing the stability properties of the popular LASSO estimator. We give sufficient conditions for the LASSO estimator to be robust to smallchanges in the data set, and show that these conditions hold with high probability under essentially the same assumptions that are used in the literature to analyze convergence of the LASSO. 


Carol Harlow