From Wikipedia, the free encyclopedia
Standard notation as it is used within deep learning , has changed a lot since the first published works. It is undergoing some standardization, but mostly at an informal level.
Notation
General
Training, validation, and test sets
Superscript
(
i
)
{\displaystyle \left(i\right)}
like
x
(
i
)
{\displaystyle \mathbf {x} ^{\left(i\right)}}
denotes the iᵗʰ training example in a trainingset
layer
Superscript
[
l
]
{\displaystyle \left[l\right]}
like
x
[
l
]
{\displaystyle \mathbf {x} ^{\left[l\right]}}
denotes the lᵗʰ layer in a set of layers
sequence
Superscript
⟨
t
⟩
{\displaystyle \left\langle t\right\rangle }
like
x
⟨
t
⟩
{\displaystyle \mathbf {x} ^{\left\langle t\right\rangle }}
denotes the tᵗʰ item in a sequence of items
1D node
Subscript
i
{\displaystyle i}
like
x
i
{\displaystyle x_{i}}
denotes the iᵗʰ node in a one-dimensional layer
2D node
Subscript
i
j
{\displaystyle ij}
or
i
,
j
{\displaystyle i,j}
like
x
i
j
{\displaystyle x_{ij}}
or
x
i
,
j
{\displaystyle x_{i,j}}
denotes the node at iᵗʰ row and jᵗʰ column in a two-dimensional layer[ note 1]
1D weight
Subscript
i
j
{\displaystyle ij}
or
i
,
j
{\displaystyle i,j}
like
w
i
j
{\displaystyle w_{ij}}
or
w
i
,
j
{\displaystyle w_{i,j}}
denotes the weight between node iᵗʰ at previous layer and jᵗʰ at following layer[ note 2]
cross entropy
H
(
p
,
q
)
=
−
∑
x
∈
X
p
(
x
)
log
q
(
x
)
{\displaystyle H(p,q)=-\sum _{x\in {\mathcal {X}}}p(x)\,\log q(x)}
elementwise sequence loss
L
⟨
t
⟩
(
y
^
⟨
t
⟩
,
y
⟨
t
⟩
)
{\displaystyle {\mathcal {L}}^{\left\langle t\right\rangle }\left({\hat {y}}^{\left\langle t\right\rangle },{y}^{\left\langle t\right\rangle }\right)}
and by using cross entropy
−
y
⟨
t
⟩
log
y
^
⟨
t
⟩
−
(
1
−
y
⟨
t
⟩
)
log
(
1
−
y
^
⟨
t
⟩
)
{\displaystyle -y^{\left\langle t\right\rangle }\,\log {\hat {y}}^{\left\langle t\right\rangle }-\left(1-y^{\left\langle t\right\rangle }\right)\,\log \left(1-{\hat {y}}^{\left\langle t\right\rangle }\right)}
that is the sum would be over
{
x
∈
X
:
similar
,
dissimilar
}
{\displaystyle \left\{x\in {\mathcal {X}}:{\text{similar}},{\text{dissimilar}}\right\}}
for classification in and out of a single class
References
Notes
^ This can easily be confused with a weight index.
^ Michael Nielson defines
w
j
k
{\displaystyle w_{jk}}
as weight from kᵗʰ neuron to jᵗʰ , while Andrew Ng defines it in opposite direction.