Preview

Learning Rate

Powerful Essays
Open Document
Open Document
7891 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Learning Rate
No More Pesky Learning Rates

Tom Schaul Sixin Zhang Yann LeCun Courant Institute of Mathematical Sciences New York University 715 Broadway, New York, NY 10003, USA

schaul@cims.nyu.edu zsx@cims.nyu.edu yann@cims.nyu.edu

arXiv:1206.1106v2 [stat.ML] 18 Feb 2013

Abstract
The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable for non-stationary problems. Using a number of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of SGD or other adaptive approaches with their best settings obtained through systematic search, and effectively removes the need for learning rate tuning.

learning rates for different parameters), so as to minimize some estimate of the expectation of the loss at any one time. Starting from an idealized scenario where every sample’s contribution to the loss is quadratic and separable, we derive a formula for the optimal learning rates for SGD, based on estimates of the variance of the gradient. The formula has two components: one that captures variability across samples, and one that captures the local curvature, both of which can be estimated in practice. The method can be used to derive a single common learning rate, or local learning rates for each parameter, or each block of parameters, leading to five variations of the basic algorithm, none of which need any parameter tuning. The performance of the methods obtained without any manual tuning are reported on a variety of convex and non-convex learning models and tasks. They compare favorably with an “ideal SGD”, where the best possible learning rate was obtained through systematic search, as well as

You May Also Find These Documents Helpful