This is an old revision of this page, as edited by Citation bot(talk | contribs) at 06:04, 22 July 2023(Alter: title, url, template type. URLs might have been anonymized. Add: chapter-url, chapter. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | #UCB_CommandLine). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.Revision as of 06:04, 22 July 2023 by Citation bot(talk | contribs)(Alter: title, url, template type. URLs might have been anonymized. Add: chapter-url, chapter. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | #UCB_CommandLine)
where β is either constant or a trainable parameter depending on the model. For β = 1, the function becomes equivalent to the Sigmoid Linear Unit[2] or SiLU, first proposed alongside the GELU in 2016. The SiLU was later rediscovered in 2017 as the Sigmoid-weighted Linear Unit (SiL) function used in reinforcement learning.[3][1] The SiLU/SiL was then rediscovered as the swish over a year after its initial discovery, originally proposed without the learnable parameter β, so that β implicitly equalled 1. The swish paper was then updated to propose the activation with the learnable parameter β, though researchers usually let β = 1 and do not use the learnable parameter β. For β = 0, the function turns into the scaled linear function f(x) = x/2.[1] With β → ∞, the sigmoid component approaches a 0-1 function pointwise, so swish approaches the ReLU function pointwise. Thus, it can be viewed as a smoothing function which nonlinearly interpolates between a linear function and the ReLU function.[1] This function uses non-monotonicity, and may have influenced the proposal of other activation functions with this property such as Mish.[4]
When considering positive values, Swish is a particular case of sigmoid shrinkage function defined in [5] (see the doubly parameterized sigmoid shrinkage form given by Equation (3) of this reference).
^ abcdeRamachandran, Prajit; Zoph, Barret; Le, Quoc V. (2017-10-27). "Searching for Activation Functions". arXiv:1710.05941v2 [cs.NE].
^Hendrycks, Dan; Gimpel, Kevin (2016). "Gaussian Error Linear Units (GELUs)". arXiv:1606.08415 [cs.LG].
^Elfwing, Stefan; Uchibe, Eiji; Doya, Kenji (2017-11-02). "Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning". arXiv:1702.03118v3 [cs.LG].