The Langevin algorithm is a popular variant of gradient descent, where properly scaled isotropic Gaussian noise is added to the gradient at each iteration. This modest change allows the algorithm to escape local minima and suffices to guarantee asymptotic convergence to global minimizers for sufficiently regular non-convex objectives. In this talk, based on joint work with Belinda Tzen and Tengyuan Liang, I will present the analysis of path-wise behavior of the Langevin algorithm for non-convex empirical risk minimization. In particular, I will first show the following empirical metastability property of the Langevin algorithm. For a particular local minimum of the empirical risk, with high probability, at least one of the following two events will occur: either the Langevin trajectory ends up somewhere outside an epsilon-neighborhood of this local minimum within a short recurrence time; or it enters this neighborhood by the recurrence time and stays there until an exponentially long escape time. I will then discuss implications of this path-wise concentration result for local notions of generalization and optimality in statistical learning.
Maxim Raginsky received the B.S. and M.S. degrees in 2000 and the Ph.D. degree in 2002 from Northwestern University, all in electrical engineering. He has held research positions with Northwestern, University of Illinois at Urbana-Champaign (where he was a Beckman Foundation Fellow from 2004 to 2007), and Duke University. In 2012, he returned to UIUC, where he is currently an Associate Professor and William L. Everitt Fellow with the Department of Electrical and Computer Engineering. Dr. Raginsky received a Faculty Early Career Development (CAREER) Award from the National Science Foundation in 2013. His research interests lie at the intersection of information theory, machine learning, and control. He is a member of the editorial boards of IEEE Transactions on Information Theory, IEEE Transactions on Network Science and Engineering, and Foundations and Trends in Communications and Information Theory.