Draft:Universal approximation theorem

In the mathematical theory of artificial neural networks, the universal approximation theorem states that feedforward neural networks constructed of artificial neurons can approximate real-valued continuous functions on compact subsets of $\mathbb {R} ^{n}$ , to arbitrary accuracy. The theorem thus implies that simple neural networks can in principle be applied to nearly any problem, as they can approximate essentially any function of interest.

Early versions of the theorem considered networks of arbitrary width. In particular these were considered by George Cybenko in 1989,^[1] Kurt Hornik in 1991,^[2] and Moshe Leshno et al in 1993.^[3] A simple general formulation was given by Allan Pinkus in 1999,^[4] which is the version stated here. Later versions considered the 'dual' problem for networks of arbitrary depth. In particular these were considered by Zhou Lu et al in 2017 ^[5] and Boris Hanin and Mark Sellke in 2018.^[6] A simple general formulation was given by Patrick Kidger and Terry Lyons in 2020 ^[7], which is the version stated here.

Several extensions of the theorem exist, such as to discontinuous activation functions, alternative network architectures, other topologies, and noncompact domains. ^[3]^[7]^[8].

Formal statements

The classical arbitrary width version of the theorem may be stated as follows.

Universal approximation theorem; arbitrary width.^[4] Let $\varphi :\mathbb {R} \to \mathbb {R}$ be any continuous function (called the activation function). Let $K\subseteq \mathbb {R} ^{n}$ be compact. The space of real-valued continuous functions on $K$ is denoted by $C(K)$ . Let ${\mathcal {M}}$ denote the space of functions of the form

$F(x)=\sum _{i=1}^{N}v_{i}\varphi \left(w_{i}^{T}x+b_{i}\right)$

for all integers $N\in \mathbb {N}$ , real constants $v_{i},b_{i}\in \mathbb {R}$ and real vectors $w_{i}\in \mathbb {R} ^{m}$ for $i=1,\ldots ,N$ .
Then, if and only if $\varphi$ is nonpolynomial, the following statement is true: given any $\varepsilon >0$ and any $f\in C(K)$ , there exists $F\in {\mathcal {M}}$ such that

$|F(x)-f(x)|<\varepsilon$

for all $x\in K$ .
In other words, ${\mathcal {M}}$ is dense in $C(K)$ with respect to the uniform norm if and only if $\varphi$ is nonpolynomial.

This theorem extends straightforwardly to networks with any fixed number of hidden layers: the theorem implies that the first layer can approximate any desired function, and that later layers can approximate the identity function. Thus any fixed-depth network may approximate any continuous function, and this version of the theorem applies to networks with bounded depth and arbitrary width.

The 'dual' version of the the theorem considers networks of bounded width and arbitrary depth. Unlike the previous theorem, it gives sufficient but not necessary conditions.

Universal approximation theorem; arbitrary depth.^[7] Let $\varphi :\mathbb {R} \to \mathbb {R}$ be any nonaffine continuous function which is continuously differentiable at at least one point, with nonzero derivative at that point. Let $K\subseteq \mathbb {R} ^{n}$ be compact. The space of real vector-valued continuous functions on $K$ is denoted by $C(K;\mathbb {R} ^{m})$ . Let ${\mathcal {N}}$ denote the space of feedforward neural networks with $n$ input neurons, $m$ output neurons, and an arbitrary number of hidden layers each with $n+m+2$ neurons, such that every hidden neuron has activation function $\varphi$ and every output neuron has the identity as its activation function. Then given any $\varepsilon >0$ and any $f\in C(K;\mathbb {R} ^{m})$ , there exists $F\in {\mathcal {N}}$ such that

$|F(x)-f(x)|<\varepsilon$

for all $x\in K$ .
In other words, ${\mathcal {N}}$ is dense in $C(K;\mathbb {R} ^{m})$ with respect to the uniform norm.

Certain necessary conditions for the bounded width, arbitrary depth case have been established, but there is still a gap between the known sufficient and necessary conditions.^[5]^[6]^[9].

The arbitrary depth case includes polynomial activation functions, which are specifically excluded from the arbitrary width case. This is an example of a qualitative difference between (particular interpretations of) deep and shallow neural networks.

References

^ Cybenko, G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2(4), 303–314. doi:10.1007/BF02551274
^ Hornik, Kurt (1991). "Approximation capabilities of multilayer feedforward networks". Neural Networks. 4 (2): 251–257. doi:10.1016/0893-6080(91)90009-T.
^ ^a ^b Leshno, Moshe; Lin, Vladimir Ya.; Pinkus, Allan; Schocken, Shimon (January 1993). "Multilayer feedforward networks with a nonpolynomial activation function can approximate any function". Neural Networks. 6 (6): 861–867. doi:10.1016/S0893-6080(05)80131-5.
^ ^a ^b Pinkus, Allan (January 1999). "Approximation theory of the MLP model in neural networks". Acta Numerica. 8: 143–195. doi:10.1017/S0962492900002919.
^ ^a ^b Lu, Zhou; Pu, Homgming; Wang, Feicheng; Hu, Zhiqiang; Wang, Liwei. "The Expressive Power of Neural Networks: A View from the Width". Advances in Neural Information Processing Systems 30. Curran Associates, Inc.: 6231–6239.
^ ^a ^b Hanin, Boris; Sellke, Mark (March 2018). "Approximating Continuous Functions by ReLU Nets of Minimal Width". arXiv:1710.11278.
^ ^a ^b ^c Kidger, Patrick; Lyons, Terry (July 2020). Universal Approximation with Deep Narrow Networks. Conference on Learning Theory.
^ Lin, Hongzhou; Jegelka, Stefanie (2018). ResNet with one-neuron hidden layers is a Universal Approximator. Advances in Neural Information Processing Systems 30. Curran Associates, Inc. pp. 6169–6178.
^ Johnson, Jesse (2019). Deep, Skinny Neural Networks are not Universal Approximators. International Conference on Learning Representations.

Universal approximation theorem

Review waiting, please be patient.

This may take 8 weeks or more, since drafts are reviewed in no specific order. There are 989 pending submissions waiting for review.

If the submission is accepted, then this page will be moved into the article space.
If the submission is declined, then the reason will be posted here.
In the meantime, you can continue to improve this submission by editing normally.

Where to get help

If you need help editing or submitting your draft, please ask us a question at the AfC Help Desk or get live help from experienced editors. These venues are only for help with editing and the submission process, not to get reviews.
If you need feedback on your draft, or if the review is taking a lot of time, you can try asking for help on the talk page of a relevant WikiProject. Some WikiProjects are more active than others so a speedy reply is not guaranteed.

How to improve a draft

Wikipedia:Contributing to Wikipedia – a basic overview on how to edit Wikipedia.
Help:Wikitext – how to use the markup
Help:Referencing for beginners – how to include references
Wikipedia:Article development – how to develop your article
Wikipedia:Writing better articles – how to improve your article
Wikipedia:Verifiability – make sure your article includes reliable third-party sources

You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article.

Improving your odds of a speedy review

To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags.

Add tags to your draft

Editor resources

Easy tools: Citation bot (help) | Advanced: Fix bare URLs

Reviewer tools

Instructions · What links here · Universal approximation theorem (talk: + · bio) · (log) · Copyvios report · reFill · Citation Bot · (Search: Google, Wikipedia) · Submitted 4 years ago by PatrickKidger (talk: D · +) · Last edited 4 years ago by Timtrent

Warning: The page Universal approximation theorem already exists. Please ensure it is not a copy or that this page is located at the correct title.

The WikiProject banner below should be moved to this redirect's talk page. If this is a demonstration of the template, please set the parameter |category=no to prevent this page being miscategorised.

Mathematics Redirect‑class

	Mathematics portal This redirect is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics
Redirect	This redirect does not require a rating on Wikipedia's content assessment scale.

The WikiProject banner below should be moved to this redirect's talk page. If this is a demonstration of the template, please set the parameter |category=no to prevent this page being miscategorised.

Computer science Redirect‑class

This redirect is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science

Redirect

This redirect does not require a rating on Wikipedia's content assessment scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

First time making a Wikipedia edit; I think (hope) I'm doing the right thing here. Looking to update the "Universal approximation theorem" article.

Essential changes:

- Updated out-of-date results (by 20 years...)

- Now describe the arbitrary width and depth cases using similar language to each other.

- General tidy-up, so the article is now shorter.

- Added additional references.

COI:

- I wrote one of the papers that's now being discussed. It generalises+simplifies previous work, so I think it's fair.

[cyb-1] Cybenko, G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2(4), 303–314. doi:10.1007/BF02551274

[horn-2] Hornik, Kurt (1991). "Approximation capabilities of multilayer feedforward networks". Neural Networks. 4 (2): 251–257. doi:10.1016/0893-6080(91)90009-T.

[leshno-3] Leshno, Moshe; Lin, Vladimir Ya.; Pinkus, Allan; Schocken, Shimon (January 1993). "Multilayer feedforward networks with a nonpolynomial activation function can approximate any function". Neural Networks. 6 (6): 861–867. doi:10.1016/S0893-6080(05)80131-5.

[pinkus-4] Pinkus, Allan (January 1999). "Approximation theory of the MLP model in neural networks". Acta Numerica. 8: 143–195. doi:10.1017/S0962492900002919.

[ZhouLu-5] Lu, Zhou; Pu, Homgming; Wang, Feicheng; Hu, Zhiqiang; Wang, Liwei. "The Expressive Power of Neural Networks: A View from the Width". Advances in Neural Information Processing Systems 30. Curran Associates, Inc.: 6231–6239.

[hanin-6] Hanin, Boris; Sellke, Mark (March 2018). "Approximating Continuous Functions by ReLU Nets of Minimal Width". arXiv:1710.11278.

[kidger-7] Kidger, Patrick; Lyons, Terry (July 2020). Universal Approximation with Deep Narrow Networks. Conference on Learning Theory.

[8] Lin, Hongzhou; Jegelka, Stefanie (2018). ResNet with one-neuron hidden layers is a Universal Approximator. Advances in Neural Information Processing Systems 30. Curran Associates, Inc. pp. 6169–6178.

[johnson-9] Johnson, Jesse (2019). Deep, Skinny Neural Networks are not Universal Approximators. International Conference on Learning Representations.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Formal statements

See also

References

Universal approximation theorem