The cookie-related information is fully under our control. These cookies are not used for any purpose other than those described here. Unibo policy
We establish a connection between nonconvex optimization of the kind used in Deep Learning, and nonlinear partial differential equations (PDEs). We interpret empirically successful relaxation techniques motivated from statistical physics for training deep neural networks as solutions of a viscous HamiltonJacobi (HJ) PDE. The underlying stochastic control interpretation allows us to prove that these techniques perform better than stochastic gradient descent (SGD). Moreover, we derive this PDE from a stochastic homogenization problem which proves connections to algorithms for distributed training of deep networks like ElasticSGD. Our analysis provides insight into the geometry of the energy landscape and suggests new algorithms based on the non-viscous HamiltonJacobi PDE that can effectively tackle the high dimensionality of modern neural networks. Joint work with Pratik Chaudhari, Adam Oberman, Stanley Osher and Guillaume Carlier. Preview at: ArXiv 1704.04932
This presentation is part of Minisymposium “MS70 - Innovative Challenging Applications in Imaging Sciences (2 parts)”
organized by: Roberto Mecca (University of Bologna and University of Cambridge) , Giulia Scalet (Dept. Civil Engineering and Architecture, University of Pavia) , Federica Sciacchitano (Dept. Mathematics, University of Genoa) .