Marcus Hutter (DeepMind and Australian National University)
Thursday, March 4, 2021 - 17:00
Virtual event (Videobroadcast) - link for registration
Max-Planck-Institut fuer Mathematik in den Naturwissenschaften, 04103 Leipzig
Recently a number of empirical "universal" scaling law papers have been published, most notably by OpenAI. ‘Scaling laws’ refers to power-law decreases of training or test error w.r.t. more data, larger neural networks, and/or more compute. In this work we focus on scaling w.r.t. data size n. Theoretical understanding of this phenomenon is largely lacking, except in finite-dimensional models for which error typically decreases with n^−1/2 or n^−1, where n is the sample size. We develop and theoretically analyse the simplest possible (toy) model that can exhibit n^−β learning curves for arbitrary power β>0, and determine whether power laws are universal or depend on the data distribution.
submitted by Valeria Hunniger (Valeria.Huenniger@mis.mpg.de, 0341 9959 50)