Academy of Mathematics and Systems Science, CAS Colloquia & Seminars
Speaker:
许志钦副教授,上海交通大学
Inviter:
张世华
Title:
Condensation in deep learning
Language:
Chinese
Time & Venue:
2022.12.08 19:30-21:00 腾讯会议 ID:464 1423 6743
Abstract:
Why do neural networks (NN) that look so complex usually generalize well? To understand this problem, we find some simple implicit regularizations during training NNs. The first is the frequency principle that NNs learn from low frequency to high frequency. The second is the parameter condensation, a feature of non-linear training process, which makes the network size effectively much smaller. Based on the condensation, we find an intrinsic embedding principle of NN loss landscape and develop a rank analysis framework to quantitatively understand how much data size an overparameterized NN needs in order to generalize well.