User:Jeblad/Standard notation (neural net)

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Standard notation as it is used within deep learning, has changed a lot since the first published works. It is undergoing some standardization, but mostly at an informal level.

Notation

Indexes

training: Superscript $\left(i\right)$ like $\mathbf {x} ^{\left(i\right)}$ denotes the iᵗʰ training example in a trainingset
layer: Superscript $\left[l\right]$ like $\mathbf {x} ^{\left[l\right]}$ denotes the lᵗʰ layer in a set of layers
sequence: Superscript $\left\langle t\right\rangle$ like $\mathbf {x} ^{\left\langle t\right\rangle }$ denotes the tᵗʰ item in a sequence of items
1D node: Subscript $i$ like $x_{i}$ denotes the iᵗʰ node in a one-dimensional layer
2D node: Subscript $ij$ or $i,j$ like $x_{ij}$ or $x_{i,j}$ denotes the node at iᵗʰ row and jᵗʰ column in a two-dimensional layer^{[note 1]}
1D weight: Subscript $ij$ or $i,j$ like $w_{ij}$ or $w_{i,j}$ denotes the weight between node iᵗʰ at previous layer and jᵗʰ at following layer^{[note 2]}

Sizes

number of samples: $m$ is the number of samples in the dataset
input size: $n_{x}$ is the (possibly multidimensional) size of input $x$ (or number of features)
output size: $n_{y}$ is the (possibly multidimensional) size of output $y$ (or number of classes)
hidden units: $n_{h}^{\left[l\right]}$ is the number of units in hidden layer $\left[l\right]$
number of layers: $L$ is the number of layers in the network
input sequence size: $T_{x}$ is the size of the input sequence
output sequence size: $T_{y}$ is the size of the output sequence
input training sequence size: $T_{x}^{\left(i\right)}$ is the size of the input training sequence $\left(i\right)$ (each sample training sequence)
output training sequence size: $T_{y}^{\left(i\right)}$ is the size of the output training sequence $\left(i\right)$ (each sample training sequence)

Other

cross entropy: $H(p,q)=-\sum _{x\in {\mathcal {X}}}p(x)\,\log q(x)$
elementwise sequence loss: ${\mathcal {L}}^{\left\langle t\right\rangle }\left({\hat {y}}^{\left\langle t\right\rangle },{y}^{\left\langle t\right\rangle }\right)$ and by using cross entropy $-y^{\left\langle t\right\rangle }\,\log {\hat {y}}^{\left\langle t\right\rangle }-\left(1-y^{\left\langle t\right\rangle }\right)\,\log \left(1-{\hat {y}}^{\left\langle t\right\rangle }\right)$ that is the sum would be over $\left\{x\in {\mathcal {X}}:{\text{similar}},{\text{dissimilar}}\right\}$ for classification in and out of a single class

References

Notes

^ This can easily be confused with a weight index.
^ Michael Nielson defines $w_{jk}$ as weight from kᵗʰ neuron to jᵗʰ, while Andrew Ng defines it in opposite direction.

[nodes-1] This can easily be confused with a weight index.

[weights-2] Michael Nielson defines $w_{jk}$ as weight from kᵗʰ neuron to jᵗʰ, while Andrew Ng defines it in opposite direction.

[note 1]

[note 2]