Shift-curvature, SGD, and generalization

Bradley, Arwen V and Gomez-Uribe, Carlos A and Vuyyuru, Manish Reddy (2022) Shift-curvature, SGD, and generalization. Machine Learning: Science and Technology, 3 (4). 045002. ISSN 2632-2153

[thumbnail of Bradley_2022_Mach._Learn.__Sci._Technol._3_045002.pdf] Text
Bradley_2022_Mach._Learn.__Sci._Technol._3_045002.pdf - Published Version

Download (2MB)

Abstract

A longstanding debate surrounds the related hypotheses that low-curvature minima generalize better, and that stochastic gradient descent (SGD) discourages curvature. We offer a more complete and nuanced view in support of both hypotheses. First, we show that curvature harms test performance through two new mechanisms, the shift-curvature and bias-curvature, in addition to a known parameter-covariance mechanism. The shift refers to the difference between train and test local minima, and the bias and covariance are those of the parameter distribution. These three curvature-mediated contributions to test performance are reparametrization-invariant even though curvature itself is not. Although the shift is unknown at training time, the shift-curvature as well as the other mechanisms can still be mitigated by minimizing overall curvature. Second, we derive a new, explicit SGD steady-state distribution showing that SGD optimizes an effective potential related to but different from train loss, and that SGD noise mediates a trade-off between low-loss versus low-curvature regions of this effective potential. Third, combining our test performance analysis with the SGD steady state shows that for small SGD noise, the shift-curvature is the dominant of the three mechanisms. Our experiments demonstrate the significant impact of shift-curvature on test loss, and further explore the relationship between SGD noise and curvature.

Item Type: Article
Subjects: European Scholar > Multidisciplinary
Depositing User: Managing Editor
Date Deposited: 09 Jul 2023 03:31
Last Modified: 09 Oct 2023 06:01
URI: http://article.publish4promo.com/id/eprint/2096

Actions (login required)

View Item
View Item