User:Jeblad/Standard notation (neural net)

Standard notation as it is used within deep learning, has changed a lot since the first published works. It is undergoing some standardization, but mostly at an informal level.

Notation

General

training: Superscript $\left(i\right)$ like $\mathbf {x} ^{\left(i\right)}$ denotes the iᵗʰ training example in a trainingset
layer: Superscript $\left[l\right]$ like $\mathbf {x} ^{\left[l\right]}$ denotes the lᵗʰ layer in a set of layers
sequence: Superscript $\left\langle t\right\rangle$ like $\mathbf {x} ^{\left\langle t\right\rangle }$ denotes the tᵗʰ item in a sequence of items
1D node: Subscript $i$ like $x_{i}$ denotes the iᵗʰ node in a one-dimensional layer
2D node: Subscript $ij$ or $i,j$ like $x_{ij}$ or $x_{i,j}$ denotes the node at iᵗʰ row and jᵗʰ column in a two-dimensional layer^{[note 1]}
1D weight: Subscript $ij$ or $ij$ like $w_{ij}$ or $w_{i,j}$ denotes the weight between node iᵗʰ at previous layer and jᵗʰ at following layer^{[note 2]}

cross entropy: $H(p,q)=-\sum _{x\in {\mathcal {X}}}p(x)\,\log q(x)$
elementwise sequence loss: ${\mathcal {L}}^{\left\langle t\right\rangle }\left({\hat {y}}^{\left\langle t\right\rangle },{y}^{\left\langle t\right\rangle }\right)$ and by using cross entropy $-y^{\left\langle t\right\rangle }\,\log {\hat {y}}^{\left\langle t\right\rangle }-\left(1-y^{\left\langle t\right\rangle }\right)\,\log \left(1-{\hat {y}}^{\left\langle t\right\rangle }\right)$ that is the sum would be over $\left\{x\in {\mathcal {X}}:{\text{truty}},{\text{falsy}}\right\}$

References

Notes

^ This can easily be confused with a weight index.
^ Michael Nielson defines $w_{jk}$ as weight from kᵗʰ neuron to jᵗʰ, while Andrew Ng defines it in opposite direction.

[nodes-1] This can easily be confused with a weight index.

[weights-2] Michael Nielson defines $w_{jk}$ as weight from kᵗʰ neuron to jᵗʰ, while Andrew Ng defines it in opposite direction.

[note 1]

[note 2]