Dr.Chao Ma:Implicit biases of optimization algorithms for neural networks and their effects on generalization
Academy of Mathematics and Systems Science, CAS Colloquia & Seminars
Speaker:
Dr.Chao Ma, Stanford University
Inviter:
明平兵
Title:
Implicit biases of optimization algorithms for neural networks and their effects on generalization
Language:
Chinese
Time & Venue:
2022.12.23 09:00-10:00 腾讯会议:247-520-003
Abstract:
Modern neural networks are usually over-parameterized—the number of parameters exceeds the number of training data. In this case the loss functions tend to have many (or even infinite) global minima, which imposes an additional challenge of minima selection on optimization algorithms besides the convergence. Specifically, when training a neural network, the algorithm not only has to find a global minimum, but also needs to select minima with good generalization among many other bad ones. In this talk, we connect the implicit bias of optimization algorithms and the generalization performance via two steps. First, with a linear stability analysis around global minima, we show that stochastic gradient descent (SGD) favors flat and uniform global minima. Then, we build a theoretical connection of flatness and generalization performance based on a special multiplicative structure of neural networks. Together, we show that SGD tends to find global minima with good generalization. Bounds for generalization error and adversarial robustness depending on SGD hyperparameters are derived.