From Wikipedia, the free encyclopedia
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Standard notation as it is used within deep learning , has changed a lot since the first published works. It is undergoing some standardization, but mostly at an informal level.
Notation
Indexes
training
Superscript
(
i
)
{\displaystyle \left(i\right)}
like
x
(
i
)
{\displaystyle \mathbf {x} ^{\left(i\right)}}
denotes the iᵗʰ training example in a trainingset
layer
Superscript
[
l
]
{\displaystyle \left[l\right]}
like
x
[
l
]
{\displaystyle \mathbf {x} ^{\left[l\right]}}
denotes the lᵗʰ layer in a set of layers
sequence
Superscript
⟨
t
⟩
{\displaystyle \left\langle t\right\rangle }
like
x
⟨
t
⟩
{\displaystyle \mathbf {x} ^{\left\langle t\right\rangle }}
denotes the tᵗʰ item in a sequence of items
1D node
Subscript
i
{\displaystyle i}
like
x
i
{\displaystyle x_{i}}
denotes the iᵗʰ node in a one-dimensional layer
2D node
Subscript
i
j
{\displaystyle ij}
or
i
,
j
{\displaystyle i,j}
like
x
i
j
{\displaystyle x_{ij}}
or
x
i
,
j
{\displaystyle x_{i,j}}
denotes the node at iᵗʰ row and jᵗʰ column in a two-dimensional layer[ note 1]
1D weight
Subscript
i
j
{\displaystyle ij}
or
i
,
j
{\displaystyle i,j}
like
w
i
j
{\displaystyle w_{ij}}
or
w
i
,
j
{\displaystyle w_{i,j}}
denotes the weight between node iᵗʰ at previous layer and jᵗʰ at following layer[ note 2]
Sizes
number of samples
m
{\displaystyle m}
is the number of samples in the dataset
input size
n
x
{\displaystyle n_{x}}
is the (possibly multidimensional) size of input
x
{\displaystyle x}
(or number of features )
output size
n
y
{\displaystyle n_{y}}
is the (possibly multidimensional) size of output
y
{\displaystyle y}
(or number of classes )
hidden units
n
h
[
l
]
{\displaystyle n_{h}^{\left[l\right]}}
is the number of units in hidden layer
[
l
]
{\displaystyle \left[l\right]}
number of layers
L
{\displaystyle L}
is the number of layers in the network
input sequence size
T
x
{\displaystyle T_{x}}
is the size of the input sequence
output sequence size
T
y
{\displaystyle T_{y}}
is the size of the output sequence
input training sequence size
T
x
(
i
)
{\displaystyle T_{x}^{\left(i\right)}}
is the size of the input training sequence
(
i
)
{\displaystyle \left(i\right)}
(each sample training sequence)
output training sequence size
T
y
(
i
)
{\displaystyle T_{y}^{\left(i\right)}}
is the size of the output training sequence
(
i
)
{\displaystyle \left(i\right)}
(each sample training sequence)
Other
cross entropy
H
(
p
,
q
)
=
−
∑
x
∈
X
p
(
x
)
log
q
(
x
)
{\displaystyle H(p,q)=-\sum _{x\in {\mathcal {X}}}p(x)\,\log q(x)}
elementwise sequence loss
L
⟨
t
⟩
(
y
^
⟨
t
⟩
,
y
⟨
t
⟩
)
{\displaystyle {\mathcal {L}}^{\left\langle t\right\rangle }\left({\hat {y}}^{\left\langle t\right\rangle },{y}^{\left\langle t\right\rangle }\right)}
and by using cross entropy
−
y
⟨
t
⟩
log
y
^
⟨
t
⟩
−
(
1
−
y
⟨
t
⟩
)
log
(
1
−
y
^
⟨
t
⟩
)
{\displaystyle -y^{\left\langle t\right\rangle }\,\log {\hat {y}}^{\left\langle t\right\rangle }-\left(1-y^{\left\langle t\right\rangle }\right)\,\log \left(1-{\hat {y}}^{\left\langle t\right\rangle }\right)}
that is the sum would be over
{
x
∈
X
:
similar
,
dissimilar
}
{\displaystyle \left\{x\in {\mathcal {X}}:{\text{similar}},{\text{dissimilar}}\right\}}
for classification in and out of a single class
References
Notes
^ This can easily be confused with a weight index.
^ Michael Nielson defines
w
j
k
{\displaystyle w_{jk}}
as weight from kᵗʰ neuron to jᵗʰ , while Andrew Ng defines it in opposite direction.