Jump to content

Mathematics of neural networks in machine learning

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Lfstevens (talk | contribs) at 18:42, 18 August 2019 (split from Artificial neural network). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

A neuron with label receiving an input from predecessor neurons consists of the following components:[1]

  • an activation , the neuron's state, depending on a discrete time parameter,
  • an optional threshold , which stays fixed unless changed by learning,
  • an activation function that computes the new activation at a given time from , and the net input giving rise to the relation
,
  • and an output function computing the output from the activation
.

Often the output function is simply the identity function.

An input neuron has no predecessor but serves as input interface for the whole network. Similarly an output neuron has no successor and thus serves as output interface of the whole network.

Propagation function

The propagation function computes the input to the neuron from the outputs and typically has the form[2]

.

Bias

A bias term can be added, changing the form to the following:[3]

, where is a bias.

Neural networks as functions

Neural network models can be viewed as defining a function that takes an input (observation) and produces an output (decision).

or a distribution over or both and . Sometimes models are intimately associated with a particular learning rule. A common use of the phrase "ANN model" is really the definition of a class of such functions (where members of the class are obtained by varying parameters, connection weights, or specifics of the architecture such as the number of neurons, number of layers or their connectivity).

Mathematically, a neuron's network function is defined as a composition of other functions , that can further be decomposed into other functions. This can be conveniently represented as a network structure, with arrows depicting the dependencies between functions. A widely used type of composition is the nonlinear weighted sum, where , where (commonly referred to as the activation function[4]) is some predefined function, such as the hyperbolic tangent, sigmoid function, softmax function, or rectifier function. The important characteristic of the activation function is that it provides a smooth transition as input values change, i.e. a small change in input produces a small change in output. The following refers to a collection of functions as a vector .

ANN dependency graph

This figure depicts such a decomposition of , with dependencies between variables indicated by arrows. These can be interpreted in two ways.

The first view is the functional view: the input is transformed into a 3-dimensional vector , which is then transformed into a 2-dimensional vector , which is finally transformed into . This view is most commonly encountered in the context of optimization.

The second view is the probabilistic view: the random variable depends upon the random variable , which depends upon , which depends upon the random variable . This view is most commonly encountered in the context of graphical models.

The two views are largely equivalent. In either case, for this particular architecture, the components of individual layers are independent of each other (e.g., the components of are independent of each other given their input ). This naturally enables a degree of parallelism in the implementation.

Two separate depictions of the recurrent ANN dependency graph

Networks such as the previous one are commonly called feedforward, because their graph is a directed acyclic graph. Networks with cycles are commonly called recurrent. Such networks are commonly depicted in the manner shown at the top of the figure, where is shown as dependent upon itself. However, an implied temporal dependence is not shown.

Backpropagation

Backpropagation training algorithms fall into three categories:

References

  1. ^ Zell, Andreas (2003). "chapter 5.2". Simulation neuronaler Netze [Simulation of Neural Networks] (in German) (1st ed.). Addison-Wesley. ISBN 978-3-89319-554-1. OCLC 249017987.
  2. ^ Zell, Andreas (2003). "chapter 5.2". Simulation neuronaler Netze [Simulation of Neural Networks] (in German) (1st ed.). Addison-Wesley. ISBN 978-3-89319-554-1. OCLC 249017987.
  3. ^ DAWSON, CHRISTIAN W (1998). "An artificial neural network approach to rainfall-runoff modelling". Hydrological Sciences Journal. 43 (1): 47–66. doi:10.1080/02626669809492102.
  4. ^ "The Machine Learning Dictionary". www.cse.unsw.edu.au.
  5. ^ M. Forouzanfar; H. R. Dajani; V. Z. Groza; M. Bolic; S. Rajan (July 2010). Comparison of Feed-Forward Neural Network Training Algorithms for Oscillometric Blood Pressure Estimation. 4th Int. Workshop Soft Computing Applications. Arad, Romania: IEEE. {{cite conference}}: Unknown parameter |last-author-amp= ignored (|name-list-style= suggested) (help)