Laplace's approximation
This article, Laplace's approximation, has recently been created via the Articles for creation process. Please check to see if the reviewer has accidentally left this template after accepting the draft and take appropriate action as necessary.
Reviewer tools: Inform author |
This article, Laplace's approximation, has recently been created via the Articles for creation process. Please check to see if the reviewer has accidentally left this template after accepting the draft and take appropriate action as necessary.
Reviewer tools: Inform author |
In mathematics, Laplace's approximation fits an un-normalised Gaussian approximation to a (twice differentiable) un-normalised target density. In Bayesian statistical inference this is useful to simultaneously approximate the posterior and the marginal likelihood. The method works by matching the log density and curvature at a mode of the target density.
For example, a (possibly non-linear) regression or classification model with data set comprising inputs and outputs has (unknown) parameter vector of length . The likelihood is denoted and the parameter prior . The joint density of outputs and parameters is the object of inferential desire
The joint is equal to the product of the likelihood and the prior and by Bayes' rule, equal to the product of the marginal likelihood and posterior . Seen as a function of the joint is an un-normalised density. In Laplace's Approximation we approximate the joint by an un-normalised Gaussian , where we use to denote approximate density and for un-normalised density. Since the marginal likelihood doesn't depend on the parameter and the posterior normalises over we can immediately identify them with and of our approximation, respectively. Laplace's approximation is
where we have defined
where is the the location of a mode of the joint target density and is the matrix of second derivatives of the negative log joint target density at the mode . Thus, the Gaussian approximation matches the value and the curvature of the un-normalised target density at the mode. The value of is usually found using a gradient based method, eg. Newton's method. In summary, we have
for the approximate posterior over and the approximate log marginal likelihood respectively. The main weaknesses of Laplace's approximation are that it is symmetric around the mode and that it is very local: the entire approximation is derived from properties at a single point of the target density. Laplace's method is widely used and was pioneered in the context of neural networks by David MacKay and for Gaussian Processes by Williams and Barber, see references.
References
- MacKay, David J. C. (1992). "Bayesian Interpolation" (PDF). Neural Computation. 4 (3). MIT Press: 415–447. doi:10.1162/neco.1992.4.3.415. S2CID 1762283.
- Williams, Christopher K. I.; Barber, David (1998). "Bayesian classification with Gaussian Processes" (PDF). PAMI. 20 (12). IEEE: 1342–1351. doi:10.1109/34.735807.