Marco Mondelli (IST Austria)
Tuesday, August 4, 2020 - 11:00
MPI für Mathematik in den Naturwissenschaften Leipzig
 ,   Virtual event - Videobroadcast, Videobroadcast,  
Training a neural network is a non-convex problem that exhibits spurious and disconnected local minima. Yet, in practice neural networks with millions of parameters are successfully optimized using gradient descent methods. In this talk, I will give some theoretical insights on why this is possible. First, I will show that the combination of stochastic gradient descent and over-parameterization makes the landscape of deep networks approximately connected and, therefore, more favorable to optimization. Then, I will focus on a special case (two-layer network fitting a convex function) and provide a quantitative convergence result by exploiting the displacement convexity of a related Wasserstein gradient flow. Finally, I will go back to deep networks and show that a single wide layer followed by a pyramidal topology suffices to guarantee the global convergence of gradient descent. [Based on joint work with Adel Javanmard, Andrea Montanari, Quynh Nguyen, and Alexander Shevchenko]
submitted by Saskia Gutzschebauch (Saskia.Gutzschebauch@mis.mpg.de, 0341 9959 50)