Stochastic gradient descent uses one data sample per iteration, and is straightforward to implement. It is the workhorse for many large-scale optimization problems, especially in training deep neural networks. Numerical experiments indicate that it is also very promising for solving large-scale inverse problems. However, the theoretical understanding of the method remains very limited. In this talk, I will discuss its property from the perspective of regularization theory, and aims at explaining the excellent practical performance.
This presentation is part of Minisymposium “MS12 - New directions in hybrid data tomography (2 parts)”
organized by: Kim Knudsen (Technical University of Denmark) , Cong Shi (Georg-August-Universität Göttingen) .