首页 >> 科学研究 >> 学术讲座 >> 正文

人工智能学科交叉讲座系列第【17】期:On Efficient Training for Large-Scale Deep Learning Models

信息来源:     发布时间:2023-10-17     浏览量:




报告人:沈力

             研究科学家

                京东探索研究院

主持人:林宙辰 教授

             澳门尼威斯人网站8311智能学院、人工智能研究院

时   间:2023/10/19  10:00 - 11:00

地   址:澳门尼威斯人网站8311昌平校区教学楼115教室 / 澳门尼威斯人网站8311燕园校区理科一号楼1801

        腾讯会议:504-137-495


                


 报告题目:On Efficient Training for Large-Scale Deep Learning Models


 报告摘要:   

The field of deep learning has witnessed significant developments in recent years. Specifically, the large-scale models trained on vast amounts of data hold immense promise for practical applications, enhancing industrial productivity. However, it suffers from the unstable training process, stringent requirements of computational resources, and underexplored convergence analysis, e.g., Adam, as one of the most influential adaptive stochastic algorithms for training deep neural networks, has been pointed out to be divergent even in the simple convex setting via a few simple counterexamples. In this talk, we systematically investigate the convergence theory and application of efficient training algorithms for pretraining large-scale deep learning models from the perspective of optimization. Specifically, (i) we derive the first easy-to-check sufficient condition, which merely depends on the parameters of the base learning rate and combinations of historical second-order moments, to guarantee the global convergence of the Adam optimizer for the non-convex stochastic setting. This observation, coupled with this sufficient condition, gives much deeper interpretations on the divergence of Adam. (ii) We theoretically show that distributed Adam can be linearly accelerated by using a larger number of nodes. (iii) We propose a communication-efficient variant of distributed Adam, dubbed Efficient-Adam, by adopting bi-directional compression and error-compensation techniques to reduce the communication cost and reduce compression bias, respectively. (iv) We develop FedLADA, a novel momentum-based federated optimizer via utilizing the global gradient descent and locally adaptive amended optimizer, to tackle the client drifts exacerbated by local over-fitting with the local adaptive optimizer in federated learning.


报告人简介:   

Li Shen is currently a research scientist at JD Explore Academy, Beijing, China. Previously, he was a senior researcher at Tencent AI Lab. He received his bachelor's degree and Ph.D. from the School of Mathematics, South China University of Technology. His research interests include theory and algorithms for nonsmooth convex and nonconvex optimization, and their applications in statistical machine learning, deep learning and reinforcement learning. He has published more than 60 papers in peer-reviewed top-tier journal papers (JMLR, IEEE TPAMI, IJCV, IEEE TSP, IEEE TIP, etc.) and conference papers (ICML, NeurIPS, ICLR, CVPR, ICCV, etc.). He has also served as the senior program committee for AAAI 2022, AAAI 2024 and area chair for ICPR 2022, ICPR 2024, ICLR 2024.