Stochastic gradient methods (SGM) are the de facto tool of choice in machine learning. I review a few popular variants of SGM. I then discuss their success in learning convex and deep models from stability standpoint. I next relate SGM to proximal methods which cast a trade-off between moving in the direction of the gradient and staying close to the current solution. The notion of closeness is generalized to data-dependent metrics which yields the adaptive gradient (AdaGrad) algorithm. Last, I show that AdaGrad is a special case of a family of adaptive metrics which also includes the online Newton method. Empirical results that demonstrate the effectiveness and generalization power of SGM conclude the talk.
Based on joint work with John Duchi (Stanford), Moritz Hardt (Google), Elad Hazan (Princeton), Vineet Gupta (Google), Tomer Koren (Google), and Ben Recht (UC Berkley).
Yoram Singer is a principal research scientist at Google. He is the head of the Principles of Effective Machine-learning (POEM) group in Google Brain. Before joining Google he was a professor at the Hebrew University of Jerusalem and a member of the technical staff at AT&T Research.