From Wikipedia, the free encyclopedia
In statistics , Fisher's scoring algorithm is a form of Newton's method used to solve maximum likelihood equations numerically .
Sketch of Derivation
Let
Y
1
,
…
,
Y
n
{\displaystyle Y_{1},\ldots ,Y_{n}}
be random variables , independent and identically distributed with twice differentiable p.d.f.
f
(
y
;
θ
)
{\displaystyle f(y;\theta )}
, and we wish to calculate the maximum likelihood estimator (M.L.E.)
θ
∗
{\displaystyle \theta ^{*}}
of
θ
{\displaystyle \theta }
. First, suppose we have a starting point for our algorithm
θ
0
{\displaystyle \theta _{0}}
, and consider a Taylor expansion of the score function ,
V
(
θ
)
{\displaystyle V(\theta )}
, about
θ
0
{\displaystyle \theta _{0}}
:
V
(
θ
)
≈
V
(
θ
0
)
−
J
(
θ
0
)
(
θ
−
θ
0
)
,
{\displaystyle V(\theta )\approx V(\theta _{0})-{\mathcal {J}}(\theta _{0})(\theta -\theta _{0}),\,}
where
J
(
θ
0
)
=
−
∑
i
=
1
n
∇
∇
⊤
|
θ
=
θ
0
log
f
(
Y
i
;
θ
)
{\displaystyle {\mathcal {J}}(\theta _{0})=-\sum _{i=1}^{n}\left.\nabla \nabla ^{\top }\right|_{\theta =\theta _{0}}\log f(Y_{i};\theta )}
is the observed information matrix at
θ
0
{\displaystyle \theta _{0}}
. Now, setting
θ
=
θ
∗
{\displaystyle \theta =\theta ^{*}}
, using that
V
(
θ
∗
)
=
0
{\displaystyle V(\theta ^{*})=0}
and rearranging gives us:
θ
∗
≈
θ
0
+
J
−
1
(
θ
0
)
V
(
θ
0
)
.
{\displaystyle \theta ^{*}\approx \theta _{0}+{\mathcal {J}}^{-1}(\theta _{0})V(\theta _{0}).\,}
We therefore use the algorithm
θ
m
+
1
=
θ
m
+
J
−
1
(
θ
m
)
V
(
θ
m
)
,
{\displaystyle \theta _{m+1}=\theta _{m}+{\mathcal {J}}^{-1}(\theta _{m})V(\theta _{m}),\,}
and under certain regularity conditions, it can be shown that
θ
m
→
θ
∗
{\displaystyle \theta _{m}\rightarrow \theta ^{*}}
.
Fisher scoring
In practice,
I
(
θ
)
{\displaystyle {\mathcal {I}}(\theta )}
is usually replaced by
I
(
θ
)
=
E
[
J
(
θ
)
]
{\displaystyle {\mathcal {I}}(\theta )=\mathrm {E} [J(\theta )]}
, the Fisher information , thus giving us the Fisher Scoring Algorithm :
θ
m
+
1
=
θ
m
+
I
−
1
(
θ
m
)
V
(
θ
m
)
{\displaystyle \theta _{m+1}=\theta _{m}+{\mathcal {I}}^{-1}(\theta _{m})V(\theta _{m})}
.
See also
References
Jennrich, R. I., & Sampson, P. F. (1976). Newton-Raphson and related algorithms for maximum likelihood variance component estimation. Technometrics, 18, 11-17.