Jump to content

Neural network Gaussian process

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by JaschaSD (talk | contribs) at 00:56, 30 March 2020 (Created page with '{{subst:AFC submission/draftnew}}<!-- Important, do not remove this line before article has been created. --> As Bayesian Artificial neural network|artificial...'). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

As Bayesian artificial neural networks are made wider, the distribution over functions they compute converges to a Gaussian process, with a particular compositional kernel that depends on the neural network architecture and the prior distribution over model parameters. This Neural Network Gaussian Process (NNGP) can be evaluated to generate predictions that would come from an infinitely wide Bayesian neural network, without ever instantiating a neural network. The NNGP additionally describes the distribution over functions realized by non-Bayesian neural networks at random initialization.

This equivalence between wide neural networks and NNGPs has been shown to hold for: single hidden layer [] and deep [] neural networks as the number of units per layer is taken to infinity; in convolutional neural networks as the number of channels is taken to infinity; in transformer networks as the number of attention heads is taken to infinity;

This limit is of particular practical relevance, as finite width neural networks are often found to perform strictly better with increasing width.

Proof sketch

References