Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis.

Stephan Wojtowytsch

Stochastic gradient descent with noise of machine learning type. Part II: Continuous time analysis.

2021

Stephan Wojtowytsch

The representation of functions by artificial neural networks depends on a large number of parameters in a non-linear fashion. Suitable parameters of these are found by minimizing a 'loss functional', typically by stochastic gradient descent (SGD) or an advanced SGD-based algorithm. In a continuous time model for SGD with noise that follows the 'machine learning scaling', we show that in a certain noise regime, the optimization algorithm prefers 'flat' minima of the objective function in a sense which is different from the flat minimum selection of continuous time SGD with homogeneous noise.

Keywords:

Artificial neural network
Type (model theory)
Stochastic gradient descent
Computer science
Artificial intelligence
Machine learning
Scaling
Selection (genetic algorithm)
Maxima and minima
Noise
Representation (mathematics)

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations