We first propose a general step-size framework for the stochastic gradient descent (SGD) method: bandwidth-based step sizes that are allowed to vary within a banded region. The framework provides efficient and flexible step size selection in optimization, including cyclical and non-monotonic step sizes (e.g., triangular policy and cosine with restart), for which theoretical guarantees are rare. We provide state-of-the-art convergence guarantees for SGD under mild conditions and allow a large constant step size at the beginning of training. Moreover, we investigate the error bounds of SGD under the bandwidth step size where the boundary functions are in the same order and different orders, respectively. Finally, we propose a 1/t up-down policy and design novel non-monotonic step sizes. Numerical experiments demonstrate these bandwidth-based step sizes’ efficiency and significant potential in training regularized logistic regression and several large-scale neural network tasks.
Publication:
Journal of Machine Learning Research 24 (2023) 1-49
https://jmlr.org/papers/volume24/19-1009/19-1009.pdf
Author:
Xiaoyu Wang
Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing 100190, China University of Chinese Academy of Sciences No.19A Yuquan Road, Beijing 100049, China
Ya-xiang Yuan State Key Laboratory of Scientific/Engineering Computing, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing 100190, China
Email: yyx@lsec.cc.ac.cn
附件下载: