CoolMomentum: a method for stochastic optimization by Langevin dynamics with simulated annealing

Maksym Byshkin, Università della Svizzera italiana 

Deep learning applications require global optimization of non-convex objective functions, which have multiple local minima. The same problem is often found in physical simulations and may be resolved by the methods of Langevin dynamics with Simulated Annealing, which is a well-established approach for minimization of many-particle potentials. This analogy provides useful insights for non-convex stochastic optimization in machine learning. Here we find that integration of the discretized Langevin equation gives a coordinate updating rule equivalent to the famous Momentum optimization algorithm. As a main result, we show that a gradual decrease of the momentum coefficient from the initial value close to unity until zero is equivalent to application of Simulated Annealing or slow cooling, in physical terms. Making use of this novel approach, we propose CoolMomentum—a new stochastic optimization method. Applying Coolmomentum to optimization of Resnet-20 on Cifar-10 dataset and Efficientnet-B0 on Imagenet, we demonstrate that it is able to achieve high accuracies.

Byshkin Maksym is a Swiss postdoctoral research fellow and a lecturer in the Università della Svizzera italiana (USI). His research fields are statistical physics, statistical network analysis, Monte Carlo methods, stochastic optimization, computational chemistry and interdisciplinary collaborations. The research activity and research interests are focused on developments of empirical models and computational methods for high performance computing.
Maksym Byshkin holds a Master degree in applied mathematics and a Doctoral degree in theoretical physics from the Kharkov Institute of Physics and Technology (Landau school). Before joining USI he spent several years as a postdoctoral research fellow at the Free University of Brussels and the University of Salerno.

Host: Prof. Vittorio Limongelli

