Learning Rate

No More Pesky Learning Rates

Tom Schaul Sixin Zhang Yann LeCun Courant Institute of Mathematical Sciences New York University 715 Broadway, New York, NY 10003, USA

schaul@cims.nyu.edu zsx@cims.nyu.edu yann@cims.nyu.edu

arXiv:1206.1106v2 [stat.ML] 18 Feb 2013

Abstract
The performance of stochastic gradient descent (SGD) depends critically on how learning rates are tuned and decreased over time. We propose a method to automatically adjust multiple learning rates so as to minimize the expected error at any one time. The method relies on local gradient variations across samples. In our approach, learning rates can increase as well as decrease, making it suitable for non-stationary problems. Using a number of convex and non-convex learning tasks, we show that the resulting algorithm matches the performance of SGD or other adaptive approaches with their best settings obtained through systematic search, and eﬀectively removes the need for learning rate tuning.

learning rates for diﬀerent parameters), so as to minimize some estimate of the expectation of the loss at any one time. Starting from an idealized scenario where every sample’s contribution to the loss is quadratic and separable, we derive a formula for the optimal learning rates for SGD, based on estimates of the variance of the gradient. The formula has two components: one that captures variability across samples, and one that captures the local curvature, both of which can be estimated in practice. The method can be used to derive a single common learning rate, or local learning rates for each parameter, or each block of parameters, leading to ﬁve variations of the basic algorithm, none of which need any parameter tuning. The performance of the methods obtained without any manual tuning are reported on a variety of convex and non-convex learning models and tasks. They compare favorably with an “ideal SGD”, where the best possible learning rate was obtained through systematic search, as well as

Learning Rate

You May Also Find These Documents Helpful

Nt1310 Unit 7 Lab Report

Nt1310 Unit 7 Lab Report

Chapter 6 Solution Manual - South Western Federal Tax 2012

Chapter 6 Solution Manual - South Western Federal Tax 2012

QNT 351 Week 2 Learning Team Presentation Data Collection

QNT 351 Week 2 Learning Team Presentation Data Collection

Chapter 1 Understanding and Working with the Federal Tax Law Solutions to Problem Materials

Chapter 1 Understanding and Working with the Federal Tax Law Solutions to Problem Materials

Classical Period Really Focused On Silk Road Trade

Classical Period Really Focused On Silk Road Trade

Annotated Bibliography-Psy

Annotated Bibliography-Psy

Clash of Cultures

Clash of Cultures

RUMSTAD

RUMSTAD

Learning Log 1

Learning Log 1

Clash of Cultures

Clash of Cultures

Women During the Civil War

Women During the Civil War

Essays

Essays

Learning Objective

Learning Objective

My Theory on the Irresistable Force Meeting the Immovable Object Paradox

My Theory on the Irresistable Force Meeting the Immovable Object Paradox

Leave-One-Out Cross-Validation Analysis

Leave-One-Out Cross-Validation Analysis

Related Topics