From Wikipedia, the free encyclopedia
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Normal-Wishart Notation
(
μ
,
Λ
)
∼
N
W
(
μ
0
,
λ
,
W
,
ν
)
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu )}
Parameters
μ
0
∈
R
D
{\displaystyle {\boldsymbol {\mu }}_{0}\in \mathbb {R} ^{D}\,}
location (vector of real )
λ
>
0
{\displaystyle \lambda >0\,}
(real)
W
∈
R
D
×
D
{\displaystyle \mathbf {W} \in \mathbb {R} ^{D\times D}}
scale matrix (pos. def. )
ν
>
D
−
1
{\displaystyle \nu >D-1\,}
(real) Support
μ
∈
R
D
;
Λ
∈
R
D
×
D
{\displaystyle {\boldsymbol {\mu }}\in \mathbb {R} ^{D};{\boldsymbol {\Lambda }}\in \mathbb {R} ^{D\times D}}
covariance matrix (pos. def. ) PDF
f
(
μ
,
Λ
|
μ
0
,
λ
,
W
,
ν
)
=
N
(
μ
|
μ
0
,
(
λ
Λ
)
−
1
)
W
(
Λ
|
W
,
ν
)
{\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Lambda }}|{\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu )={\mathcal {N}}({\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},(\lambda {\boldsymbol {\Lambda }})^{-1})\ {\mathcal {W}}({\boldsymbol {\Lambda }}|\mathbf {W} ,\nu )}
In probability theory and statistics , the normal-Wishart distribution (or Gaussian-Wishart distribution ) is a multivariate four-parameter family of continuous probability distributions . It is the conjugate prior of a multivariate normal distribution with unknown mean and precision matrix (the inverse of the covariance matrix ).[ 1]
Definition
Suppose
μ
|
μ
0
,
λ
,
Λ
∼
N
(
μ
0
,
(
λ
Λ
)
−
1
)
{\displaystyle {\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},\lambda ,{\boldsymbol {\Lambda }}\sim {\mathcal {N}}({\boldsymbol {\mu }}_{0},(\lambda {\boldsymbol {\Lambda }})^{-1})}
has a multivariate normal distribution with mean
μ
0
{\displaystyle {\boldsymbol {\mu }}_{0}}
and covariance matrix
(
λ
Λ
)
−
1
{\displaystyle (\lambda {\boldsymbol {\Lambda }})^{-1}}
, where
Λ
|
W
,
ν
∼
W
(
Λ
|
W
,
ν
)
{\displaystyle {\boldsymbol {\Lambda }}|\mathbf {W} ,\nu \sim {\mathcal {W}}({\boldsymbol {\Lambda }}|\mathbf {W} ,\nu )}
has a Wishart distribution . Then
(
μ
,
Λ
)
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})}
has a normal-Wishart distribution, denoted as
(
μ
,
Λ
)
∼
N
W
(
μ
0
,
λ
,
W
,
ν
)
.
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu ).}
Characterization
Probability density function
f
(
μ
,
Λ
|
μ
0
,
λ
,
W
,
ν
)
=
N
(
μ
|
μ
0
,
(
λ
Λ
)
−
1
)
W
(
Λ
|
W
,
ν
)
{\displaystyle f({\boldsymbol {\mu }},{\boldsymbol {\Lambda }}|{\boldsymbol {\mu }}_{0},\lambda ,\mathbf {W} ,\nu )={\mathcal {N}}({\boldsymbol {\mu }}|{\boldsymbol {\mu }}_{0},(\lambda {\boldsymbol {\Lambda }})^{-1})\ {\mathcal {W}}({\boldsymbol {\Lambda }}|\mathbf {W} ,\nu )}
Properties
Scaling
Marginal distributions
By construction, the marginal distribution over
Λ
{\displaystyle {\boldsymbol {\Lambda }}}
is a Wishart distribution , and the conditional distribution over
μ
{\displaystyle {\boldsymbol {\mu }}}
given
Λ
{\displaystyle {\boldsymbol {\Lambda }}}
is a multivariate normal distribution . The marginal distribution over
μ
{\displaystyle {\boldsymbol {\mu }}}
is a multivariate t -distribution .
Posterior distribution of the parameters
After making
n
{\displaystyle n}
observations
x
1
,
…
,
x
n
{\displaystyle {\boldsymbol {x}}_{1},\dots ,{\boldsymbol {x}}_{n}}
, the posterior distribution of the parameters is
(
μ
,
Λ
)
∼
N
W
(
μ
n
,
λ
n
,
W
n
,
ν
n
)
,
{\displaystyle ({\boldsymbol {\mu }},{\boldsymbol {\Lambda }})\sim \mathrm {NW} ({\boldsymbol {\mu }}_{n},\lambda _{n},\mathbf {W} _{n},\nu _{n}),}
where
λ
n
=
λ
+
n
,
{\displaystyle \lambda _{n}=\lambda +n,}
μ
n
=
λ
μ
0
+
n
x
¯
λ
+
n
,
{\displaystyle {\boldsymbol {\mu }}_{n}={\frac {\lambda {\boldsymbol {\mu }}_{0}+n{\boldsymbol {\bar {x}}}}{\lambda +n}},}
ν
n
=
ν
+
n
,
{\displaystyle \nu _{n}=\nu +n,}
W
n
−
1
=
W
−
1
+
∑
i
=
1
n
(
x
i
−
x
¯
)
(
x
i
−
x
¯
)
T
+
n
λ
n
+
λ
(
x
¯
−
μ
0
)
(
x
¯
−
μ
0
)
T
.
{\displaystyle \mathbf {W} _{n}^{-1}=\mathbf {W} ^{-1}+\sum _{i=1}^{n}({\boldsymbol {x}}_{i}-{\boldsymbol {\bar {x}}})({\boldsymbol {x}}_{i}-{\boldsymbol {\bar {x}}})^{T}+{\frac {n\lambda }{n+\lambda }}({\boldsymbol {\bar {x}}}-{\boldsymbol {\mu }}_{0})({\boldsymbol {\bar {x}}}-{\boldsymbol {\mu }}_{0})^{T}.}
[ 2]
Generating normal-Wishart random variates
Generation of random variates is straightforward:
Sample
Λ
{\displaystyle {\boldsymbol {\Lambda }}}
from a Wishart distribution with parameters
W
{\displaystyle \mathbf {W} }
and
ν
{\displaystyle \nu }
Sample
μ
{\displaystyle {\boldsymbol {\mu }}}
from a multivariate normal distribution with mean
μ
0
{\displaystyle {\boldsymbol {\mu }}_{0}}
and variance
(
λ
Λ
)
−
1
{\displaystyle (\lambda {\boldsymbol {\Lambda }})^{-1}}
Notes
References
Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.
Discrete univariate
with finite support with infinite support
Continuous univariate
supported on a bounded interval supported on a semi-infinite interval supported on the whole real line with support whose type varies
Mixed univariate
Multivariate (joint) Directional Degenerate and singular Families