Jump to content

User:Shuibai/sandbox

From Wikipedia, the free encyclopedia

In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that is, the Markov chain's equilibrium distribution matches the target distribution. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution.

Markov chain Monte Carlo methods are used to study probability distributions that are too complex or too highly dimensional to study with analytic techniques alone. Various algorithms exist for constructing such Markov chains, including the Metropolis–Hastings algorithm.

Mathematical Setting

[edit]

Suppose (Xₙ) is a Markov Chain in continuous state space with specific properties. We are interested in the limiting behavior of the partial sums:

as n goes to infinity. In particular, we hope to establish the Law of Large Numbers and the Central Limit Theorem for MCMC. In the following, we state some important definitions and theorems[1].

Irreducibility and Aperiodicity

[edit]

Recall that in the discrete setting, a Markov chain is said to be irreducible if it is possible to reach any state from any other state in a finite number of steps with positive probability. However, in the continuous setting, point-to-point transitions have zero probability. In this case, φ-irreducibility generalizes irreducibility by using a reference measure φ on the measurable space .

Definition (φ-irreducibility)

Given a measure defined on , the Markov chain with transition kernel is φ-irreducible if, for every with , there exists such that for all (equivalently, ). This is a more general definition for irreducibility of a Markov chain in non-discrete state space.

In the discrete case, an irreducible Markov chain is said to be aperiodic if it has period 1. Formally, the period of a state is defined as:

For the general (non-discrete) case, we define aperiodicity in terms of small sets:

Definition (Cycle length and small sets)

A ψ-irreducible Markov chain has a cycle of length d if there exists a small set , an associated integer , and a probability distribution such that d is the greatest common divisor of:

A set is called small if there exists and a nonzero measure such that:

Harris recurrent

[edit]
Definition (Harris recurrence)

A set is Harris recurrent if for all , where is the number of visits of the chain to the set .

The chain is said to be Harris recurrent if there exists a measure such that the chain is -irreducible and every measurable set with is Harris recurrent.

A useful criterion for verifying Harris recurrence is the following:

Proposition

If for every , we have for every , then for all , and the chain is Harris recurrent.

This definition is only needed when the state space is uncountable. In the countable case, recurrence corresponds to , which is equivalent to for all .

Definition (Invariant measure)

A -finite measure is said to be invariant for the transition kernel (and the associated chain) if:

When there exists an invariant probability measure for a ψ-irreducible (hence recurrent) chain, the chain is said to be positive recurrent. Recurrent chains that do not allow for a finite invariant measure are called null recurrent.

In applications of Markov Chain Monte Carlo (MCMC), a very useful criterion for Harris recurrence involves the use of bounded harmonic functions.

Definition (Harmonic function)

A measurable function is said to be harmonic for the chain if:

These functions are invariant under the transition kernel in the functional sense, and they help characterize Harris recurrence.

Proposition

For a positive Markov chain, if the only bounded harmonic functions are the constant functions, then the chain is Harris recurrent.

Law of Large Numbers for MCMC

[edit]
Theorem (Ergodic Theorem for MCMC)

If has a -finite invariant measure , then the following two statements are equivalent:

  1. The Markov chain is Harris recurrent.
  2. If with , then

This theorem provides a fundamental justification for the use of Markov Chain Monte Carlo (MCMC) methods, and it serves as the counterpart of the Law of Large Numbers (LLN) in classical Monte Carlo.

An important aspect of this result is that does not need to be a probability measure. Moreover, the Markov chain can be started from

Central Limit Theorem for MCMC

[edit]

There are several conditions under which the Central Limit Theorem (CLT) holds for Markov chain Monte Carlo (MCMC) methods. One of the most commonly used is the condition of reversibility.

Definition (Reversibility)

A stationary Markov chain is said to be reversible if the distribution of given is the same as the distribution of given .

This is equivalent to the detailed balance condition, which is defined as follows:

Definition (Detailed balance)

A Markov chain with transition kernel satisfies the detailed balance condition if there exists a function such that:

for every pair in the state space.

Theorem (CLT under reversibility)

If is aperiodic, irreducible, and reversible with invariant distribution , then:

where

Even though reversibility is a restrictive assumption in theory, it is often easily satisfied in practical MCMC algorithms by introducing auxiliary variables or using symmetric proposal mechanisms.


General explanation

[edit]
Convergence of the Metropolis–Hastings algorithm. Markov chain Monte Carlo attempts to approximate the blue distribution with the orange distribution.

Markov chain Monte Carlo methods create samples from a continuous random variable, with probability density proportional to a known function. These samples can be used to evaluate an integral over that variable, as its expected value or variance.

Practically, an ensemble of chains is generally developed, starting from a set of points arbitrarily chosen and sufficiently distant from each other. These chains are stochastic processes of "walkers" which move around randomly according to an algorithm that looks for places with a reasonably high contribution to the integral to move into next, assigning them higher probabilities.

Random walk Monte Carlo methods are a kind of random simulation or Monte Carlo method. However, whereas the random samples of the integrand used in a conventional Monte Carlo integration are statistically independent, those used in MCMC are autocorrelated. Correlations of samples introduces the need to use the Markov chain central limit theorem when estimating the error of mean values.

These algorithms create Markov chains such that they have an equilibrium distribution which is proportional to the function given.

History

[edit]

The development of MCMC methods is deeply rooted in the early exploration of Monte Carlo (MC) techniques in the mid-20th century, particularly in physics, marked by the Metropolis algorithm proposed by Nicholas Metropolis, Arianna W. Rosenbluth, Marshall Rosenbluth, Augusta H. Teller, and Edward Teller in 1953, designed to tackle high-dimensional integration problems using early computers. W. K. Hastings generalized this algorithm in 1970 and inadvertently introduced the component-wise updating idea later known as Gibbs sampling, while theoretical foundations for Gibbs sampling, such as the Hammersley–Clifford theorem (published via Julian Besag's 1974 paper), were also developing. Although the seeds of MCMC were sown earlier, including the formal naming of Gibbs sampling in image processing by Stuart Geman and Donald Geman (1984) and the data augmentation method by Martin A. Tanner and Wing Hung Wong (1987), its "revolution" in mainstream statistics largely followed demonstrations of the universality and ease of implementation of sampling methods (especially Gibbs sampling) for complex statistical (particularly Bayesian) problems, spurred by increasing computational power and software like BUGS. This transformation was accompanied by significant theoretical advancements, such as Luke Tierney's (1994) rigorous treatment of MCMC convergence, and Jun S. Liu, Wong, and Augustine Kong's (1994, 1995) analysis of Gibbs sampler structure. Subsequent developments further expanded the MCMC toolkit, including particle filters (Sequential Monte Carlo) for sequential problems, Perfect sampling aiming for exact simulation (Jim Propp and David B. Wilson, 1996), RJMCMC (Peter J. Green, 1995) for handling variable-dimension models, and deeper investigations into convergence diagnostics and the central limit theorem. Overall, the evolution of MCMC represents a paradigm shift in statistical computation, enabling the analysis of numerous previously intractable complex models and continually expanding the scope and impact of statistics. [2]

Autocorrelation

[edit]

MCMC methods produce autocorrelated samples, in contrast to standard Monte Carlo techniques that draw independent samples. Autocorrelation means successive draws from the Markov chain are statistically dependent, so each new sample adds less fresh information than an independent draw would. As a result, one must account for this correlation when assessing the accuracy of estimates from the chain. In particular, positive autocorrelation in the chain increases the variance of estimators and slows the convergence of sample averages toward the true expectation.

Autocorrelation and efficiency: The effect of correlation on estimation can be quantified through the Markov chain central limit theorem. For a chain targeting a distribution with variance , the variance of the sample mean after steps is approximately , where is an effective sample size smaller than . Equivalently, one can express this as:

where is the sample mean and is the autocorrelation of the chain at lag . The term in parentheses, , is often called the integrated autocorrelation. When the chain has no autocorrelation ( for all ), this factor equals 1, and one recovers the usual variance for independent samples. If the chain’s samples are highly correlated, the sum of autocorrelations is large, leading to a much bigger variance for than in the independent case.

Effective sample size (ESS): The effective sample size is a useful diagnostic that translates the autocorrelation in a chain into an equivalent number of independent samples. It is defined by the formula:

so that is the number of independent draws that would yield the same estimation precision as the dependent draws from the Markov chain. For example, if , then , meaning the chain of length carries information equivalent to independent samples. In an ideal scenario with no correlation, and thus . But in a poorly mixing chain with strong autocorrelation, can be much smaller than (if the chain gets “stuck”, for many lags and approaches 0 as grows). In practice, monitoring the ESS for each parameter is a way to gauge how much correlation is present: a low ESS indicates that many more iterations may be needed to achieve a desired effective sample of independent draws.

Gelman–Rubin convergence diagnostic (): Another important diagnostic for MCMC is the Gelman-Rubin statistic , which assesses convergence and mixing by comparing multiple chains. The idea is to run several independent Markov chains (with different starting points) and check if they all produce similar results. If the chains have converged to the target distribution, the samples from each chain should be statistically indistinguishable aside from random noise. Gelman and Rubin’s method measures this by decomposing the variance of the samples into within-chain and between-chain components. Suppose we run parallel chains, each of length after burn-in. Let be the average of the variances within each chain, and let be the variance of the chain means (multiplied by , so that is the between-chain variance of the individual samples). One can show that an unbiased estimate of the target variance is given by a weighted sum of these components:

which tends to after convergence. The Gelman–Rubin statistic is then defined as

By construction, , and values close to 1 indicate that the between-chain and within-chain variances are nearly equal, signaling that all chains are sampling the same distribution. If is substantially greater than 1, it means that the between-chain variance is higher than what one would expect if all chains were mixing well. In other words, the chains have not yet converged to the same equilibrium and may be stuck in different regions of the space. This diagnostic is especially useful for detecting situations where a single chain might appear stationary but actually has not fully explored the distribution. In practice, one runs multiple chains and computes for each parameter (or summary statistic of interest) to check convergence. Interpreting diagnostics: To ensure reliable results, practitioners look for low autocorrelation (high ESS) within each chain and consistent behavior across chains. A common guideline is that should be close to 1 for all parameters—typically below about 1.1 is considered acceptable convergence in many applications. If remains above this threshold, it suggests that longer runs (or improved sampling algorithms) are needed, as the chains have not yet equilibrated. Similarly, an effective sample size that is only a small fraction of the total number of draws indicates high correlation in the chain; this can signal the need for more iterations or methods to reduce autocorrelation. By monitoring ESS and (along with trace plots and autocorrelation functions informally), one can diagnose the presence of strong correlation in the MCMC output and take steps to obtain more independent samples. Reducing autocorrelation—through better proposal distributions, reparameterization, or advanced sampling algorithms—will increase and hasten convergence, making the MCMC estimation more efficient.

Reducing correlation

[edit]

To address high autocorrelation issue issue, several general strategies can be applied to reduce correlation and improve sampling efficiency. These include reparameterizing the model, tuning or adapting the proposal distribution, and sampling blocks of parameters.

Reparameterization

[edit]

One way to reduce autocorrelation is to reformulate or reparameterize the statistical model so that the posterior geometry is more suitable to efficient sampling. If model parameters are strongly correlated a priori or a posteriori, a naive sampler may have difficulty moving through the joint space, as it must make coordinated changes to several variables at once. By changing the coordinate system or using alternative variable definitions, one can often lessen correlations. For example, in hierarchical Bayesian models, a \emph{non-centered parameterization} can be used in place of the standard (centered) formulation to avoid extreme posterior correlations between latent and higher-level parameters. This involves expressing latent variables in terms of independent auxiliary variables, dramatically improving mixing.

Proposal tuning and adaptation

[edit]

Another approach to reducing correlation is to improve the MCMC proposal mechanism. In Metropolis–Hastings algorithms, step size tuning is critical: if the proposed steps are too small, the sampler moves slowly and produces highly correlated samples; if the steps are too large, many proposals are rejected, resulting in repeated values. Adjusting the proposal step size during an initial testing phase helps find a balance where the sampler explores the space efficiently without too many rejections. More advanced adaptive MCMC methods modify proposal distributions based on the chain's past samples. For instance, adaptive Metropolis algorithms periodically update proposal covariances to approximate posterior covariance structures, greatly reducing autocorrelation.

Examples

[edit]

Random walk

[edit]
  • Metropolis–Hastings algorithm: This method generates a Markov chain using a proposal density for new steps and a method for rejecting some of the proposed moves. It is actually a general framework which includes as special cases the very first and simpler MCMC (Metropolis algorithm) and many more recent alternatives listed below.
    • Gibbs sampling: When target distribution is multi-dimensional, Gibbs sampling algorithm[3] updates each coordinate from its full conditional distribution given other coordinates. Gibbs sampling can be viewed as a special case of Metropolis–Hastings algorithm with acceptance rate uniformly equal to 1. When drawing from the full conditional distributions is not straightforward other samplers-within-Gibbs are used (e.g., see [4][5]). Gibbs sampling is popular partly because it does not require any 'tuning'. Algorithm structure of the Gibbs sampling highly resembles that of the coordinate ascent variational inference in that both algorithms utilize the full-conditional distributions in the updating procedure.[6]
    • Metropolis-adjusted Langevin algorithm and other methods that rely on the gradient (and possibly second derivative) of the log target density to propose steps that are more likely to be in the direction of higher probability density.[7]
    • Hamiltonian (or hybrid) Monte Carlo (HMC): Tries to avoid random walk behaviour by introducing an auxiliary momentum vector and implementing Hamiltonian dynamics, so the potential energy function is the target density. The momentum samples are discarded after sampling. The result of hybrid Monte Carlo is that proposals move across the sample space in larger steps; they are therefore less correlated and converge to the target distribution more rapidly.
    • Pseudo-marginal Metropolis–Hastings: This method replaces the evaluation of the density of the target distribution with an unbiased estimate and is useful when the target density is not available analytically, e.g. latent variable models.
  • Slice sampling: This method depends on the principle that one can sample from a distribution by sampling uniformly from the region under the plot of its density function. It alternates uniform sampling in the vertical direction with uniform sampling from the horizontal 'slice' defined by the current vertical position.
  • Multiple-try Metropolis: This method is a variation of the Metropolis–Hastings algorithm that allows multiple trials at each point. By making it possible to take larger steps at each iteration, it helps address the curse of dimensionality.
  • Reversible-jump: This method is a variant of the Metropolis–Hastings algorithm that allows proposals that change the dimensionality of the space.[8] Markov chain Monte Carlo methods that change dimensionality have long been used in statistical physics applications, where for some problems a distribution that is a grand canonical ensemble is used (e.g., when the number of molecules in a box is variable). But the reversible-jump variant is useful when doing Markov chain Monte Carlo or Gibbs sampling over nonparametric Bayesian models such as those involving the Dirichlet process or Chinese restaurant process, where the number of mixing components/clusters/etc. is automatically inferred from the data.

Interacting particle methods

[edit]

Interacting MCMC methodologies are a class of mean-field particle methods for obtaining random samples from a sequence of probability distributions with an increasing level of sampling complexity.[9] These probabilistic models include path space state models with increasing time horizon, posterior distributions w.r.t. sequence of partial observations, increasing constraint level sets for conditional distributions, decreasing temperature schedules associated with some Boltzmann–Gibbs distributions, and many others. In principle, any Markov chain Monte Carlo sampler can be turned into an interacting Markov chain Monte Carlo sampler. These interacting Markov chain Monte Carlo samplers can be interpreted as a way to run in parallel a sequence of Markov chain Monte Carlo samplers. For instance, interacting simulated annealing algorithms are based on independent Metropolis–Hastings moves interacting sequentially with a selection-resampling type mechanism. In contrast to traditional Markov chain Monte Carlo methods, the precision parameter of this class of interacting Markov chain Monte Carlo samplers is only related to the number of interacting Markov chain Monte Carlo samplers. These advanced particle methodologies belong to the class of Feynman–Kac particle models,[10][11] also called Sequential Monte Carlo or particle filter methods in Bayesian inference and signal processing communities.[12] Interacting Markov chain Monte Carlo methods can also be interpreted as a mutation-selection genetic particle algorithm with Markov chain Monte Carlo mutations.

Quasi-Monte Carlo

[edit]

The quasi-Monte Carlo method is an analog to the normal Monte Carlo method that uses low-discrepancy sequences instead of random numbers.[13][14] It yields an integration error that decays faster than that of true random sampling, as quantified by the Koksma–Hlawka inequality. Empirically it allows the reduction of both estimation error and convergence time by an order of magnitude.[13] Markov chain quasi-Monte Carlo methods[15][16] such as the Array–RQMC method combine randomized quasi–Monte Carlo and Markov chain simulation by simulating chains simultaneously in a way that better approximates the true distribution of the chain than with ordinary MCMC.[17] In empirical experiments, the variance of the average of a function of the state sometimes converges at rate or even faster, instead of the Monte Carlo rate.[18]

Applications

[edit]

MCMC methods are primarily used for calculating numerical approximations of multi-dimensional integrals, for example in Bayesian statistics, computational physics,[19] computational biology[20] and computational linguistics.[21][22]

Bayesian Statistics

[edit]

In Bayesian statistics, Markov chain Monte Carlo methods are typically used to calculate moments and credible intervals of posterior probability distributions. The use of MCMC methods makes it possible to compute large hierarchical models that require integrations over hundreds to thousands of unknown parameters.[23]

In rare event sampling, they are also used for generating samples that gradually populate the rare failure region.[citation needed]

Langevin Dynamics

[edit]

Langevin Dynamics are typically used in complex distribution sampling and generative models[24], via an MCMC procedure. Specifically, given the probability density function , we use its gradient as the score function and start from a prior distribution . Then, a chain is built by

for . When and , converges to a sample from the target distribution .

A simulation of sampling from a Wikipedia-logo-like distribution via Langevin Dynamics.

Convergence

[edit]

Usually it is not hard to construct a Markov chain with the desired properties. The more difficult problem is to determine (1) when to start collecting statistics and (2) how many steps are needed to converge to the stationary distribution within an acceptable error.[25][26] Fortunately, there are a variety of practical diagnostics to empirically assess convergence.

Total Variation Distance

[edit]

Formally, let denote the stationary distribution and the distribution of the Markov chain after steps starting from state . Theoretically, convergence can be quantified by measuring the total variation distance:

A chain is said to mix rapidly if for all within a small number of steps under a pre-defined tolerance . In other words, the stationary distribution is reached quickly starting from an arbitrary position, and the minimum such is known as the mixing time. In practice, however, the total variation distance is generally intractable to compute, especially in high-dimensional problems or when the stationary distribution is only known up to a normalizing constant (as in most Bayesian applications).

Gelman-Rubin Diagnostics

[edit]

The Gelman-Rubin statistic, also known as the potential scale reduction factor (PSRF), evaluates MCMC convergence by sampling multiple independent Markov chains and comparing within-chain and between-chain variances.[25] Suppose parallel MCMC samples, each of length , have been run targeting the same distribution. Let denote the -th sample from the -th chain. The Gelman-Rubin statistic is based on the decomposition of the total variance into within-chain and between-chain components.

  • Within-chain means
  • Overall mean
  • Between-chain variance
  • Within-chain variance

Using these quantities, the pooled (overdispersed) estimate of the target variance is computed as

The potential scale reduction factor (PSRF) is then defined as

If all chains have converged to the same stationary distribution, the between-chain and within-chain variances should be similar, and thus . In practice, a value of is often taken as evidence of convergence. Higher values suggest that the chains are still exploring different parts of the target distribution. An extension of PSRF to multivariate targets can be found in (add citation).

Geweke Diagnostics

[edit]

The Geweke diagnostic examines whether the distribution of samples in the early part of the Markov chain is statistically indistinguishable from the distribution in a later part. Given a sequence of correlated MCMC samples , the diagnostic splits the chain into an early segment consisting of the first samples, typically chosen as (i.e., the first 10% of the chain), and a late segment consisting of the last samples, typically chosen as (i.e., the last 50% of the chain)

Denote the sample means of these segments as:

Since MCMC samples are autocorrelated, a simple comparison of sample means is insufficient. Instead, the difference in means is standardized using an estimator of the spectral density at zero frequency, which accounts for the long-range dependencies in the chain. The test statistic is computed as:

where is an estimate of the long-run variance (i.e., the spectral density at frequency zero), commonly estimated using Newey-West estimators or batch means. Under the null hypothesis of convergence, the statistic follows an approximately standard normal distribution .

If , the null hypothesis is rejected at the 5% significance level, suggesting that the chain has not yet reached stationarity.

Heidelberger-Welch Diagnostics

[edit]

The Heidelberger-Welch diagnostic is grounded in spectral analysis and Brownian motion theory, and is particularly useful in the early stages of simulation to determine appropriate burn-in and stopping time. The diagnostic comprises of two components, a stationarity test that assesses whether the Markov chain has reached a steady-state, and a half-width test that determines whether the estimated expectation is within a user-specified precision.

Stationary Test

[edit]

Let be the output of an MCMC simulation for a scalar function , and the evaluations of the function over the chain. Define the standardized cumulative sum process:

where is the sample mean and is an estimate of the spectral density at frequency zero.

Under the null hypothesis of convergence, the process converges in distribution to a Brownian bridge. The following Cramér-von Mises statistic is used to test for stationarity:

This statistic is compared against known critical values from the Brownian bridge distribution. If the null hypothesis is rejected, the first 10% of the samples are discarded and the test can be repeated on the remaining chain until either stationarity is accepted or 50% of the chain is discarded.

Half-Width Test (Precision Check)

[edit]

Once stationarity is accepted, the second part of the diagnostic checks whether the Monte Carlo estimator is accurate enough for practical use. Assuming the central limit theorem holds, the confidence interval for the mean is given by

where is an estimate of the variance of , is the Student's critical value at confidence level and degrees of freedom , is the number of samples used.

The half-width of this interval is defined as

If the half-width is smaller than a user-defined tolerance (e.g., 0.05), the chain is considered long enough to estimate the expectation reliably. Otherwise, the simulation should be extended.

Raftery-Lewis Diagnostics

[edit]

The Raftery-Lewis diagnostic is specifically designed to assess how many iterations are needed to estimate quantiles or tail probabilities of the target distribution with a desired accuracy and confidence. Unlike Gelman-Rubin or Geweke diagnostics, which are based on assessing convergence to the entire distribution, the Raftery-Lewis diagnostic is goal-oriented as it provides estimates for the number of samples required to estimate a specific quantile of interest within a desired margin of error.

Let denote the desired quantile (e.g., 0.025) of a real-valued function : in other words, the goal is to find such that . Suppose we wish to estimate this quantile such that the estimate falls within margin of the true value with probability . That is, we want

The diagnostic proceeds by converting the output of the MCMC chain into a binary sequence:

where is the indicator function. The sequence is treated as a realization from a two-state Markov chain. While this may not be strictly true, it is often a good approximation in practice.

Let the transition matrix of the Markov chain approximation be

From the empirical transitions in the binary sequence, the Raftery-Lewis method estimates:

  • The minimum number of iterations required to achieve the desired precision and confidence for estimating the quantile is obtained based on asymptotic theory for Bernoulli processes:

where is the standard normal quantile function.

  • The burn-in period is calculated using eigenvalue analysis of the transition matrix to estimate the number of initial iterations needed for the Markov chain to forget its initial state.

Further discussion

[edit]

Many traditional MCMC methods, such as the random walk Metropolis algorithm, explore the state space by making local proposals that are accepted or rejected according to the Metropolis–Hastings criterion. These proposals are typically symmetric and directionless, resulting in diffusive behavior that slows exploration. As a consequence, chains may require a large number of iterations to adequately explore the space, especially in high dimensions, and the resulting samples often exhibit high autocorrelation.

Software

[edit]

Several software programs provide MCMC sampling capabilities, for example:

See also

[edit]

References

[edit]

Citations

[edit]
  1. ^ Robert and Casella (2004), pp. 205–246
  2. ^ Robert, Christian; Casella, George (2011). "A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data". Statistical Science. 7. 26 (1): 102–115. arXiv:0808.2902 [stat.CO]. doi:10.1214/10-STS351. {{cite journal}}: Unknown parameter |class= ignored (help)
  3. ^ Geman, Stuart; Geman, Donald (November 1984). "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images". IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-6 (6): 721–741. doi:10.1109/TPAMI.1984.4767596. ISSN 0162-8828. PMID 22499653. S2CID 5837272.
  4. ^ Gilks, W. R.; Wild, P. (1992-01-01). "Adaptive Rejection Sampling for Gibbs Sampling". Journal of the Royal Statistical Society. Series C (Applied Statistics). 41 (2): 337–348. doi:10.2307/2347565. JSTOR 2347565.
  5. ^ Gilks, W. R.; Best, N. G.; Tan, K. K. C. (1995-01-01). "Adaptive Rejection Metropolis Sampling within Gibbs Sampling". Journal of the Royal Statistical Society. Series C (Applied Statistics). 44 (4): 455–472. doi:10.2307/2986138. JSTOR 2986138.
  6. ^ Lee, Se Yoon (2021). "Gibbs sampler and coordinate ascent variational inference: A set-theoretical review". Communications in Statistics - Theory and Methods. 51 (6): 1–21. arXiv:2008.01006. doi:10.1080/03610926.2021.1921214. S2CID 220935477.
  7. ^ See Stramer 1999.
  8. ^ See Green 1995.
  9. ^ Del Moral, Pierre (2013). Mean field simulation for Monte Carlo integration. Chapman & Hall/CRC Press. p. 626.
  10. ^ Del Moral, Pierre (2004). Feynman–Kac formulae. Genealogical and interacting particle approximations. Springer. p. 575.
  11. ^ Del Moral, Pierre; Miclo, Laurent (2000). "Branching and Interacting Particle Systems Approximations of Feynman-Kac Formulae with Applications to Non-Linear Filtering". In Jacques Azéma; Michel Ledoux; Michel Émery; Marc Yor (eds.). Séminaire de Probabilités XXXIV (PDF). Lecture Notes in Mathematics. Vol. 1729. pp. 1–145. doi:10.1007/bfb0103798. ISBN 978-3-540-67314-9.
  12. ^ Del Moral, Pierre (2006). "Sequential Monte Carlo samplers". Journal of the Royal Statistical Society. Series B (Statistical Methodology). 68 (3): 411–436. arXiv:cond-mat/0212648. doi:10.1111/j.1467-9868.2006.00553.x. S2CID 12074789.
  13. ^ a b Papageorgiou, Anargyros; Traub, Joseph (1996). "Beating Monte Carlo" (PDF). Risk. 9 (6): 63–65.
  14. ^ Sobol, Ilya M (1998). "On quasi-monte carlo integrations". Mathematics and Computers in Simulation. 47 (2): 103–112. doi:10.1016/s0378-4754(98)00096-2.
  15. ^ Chen, S.; Dick, Josef; Owen, Art B. (2011). "Consistency of Markov chain quasi-Monte Carlo on continuous state spaces". Annals of Statistics. 39 (2): 673–701. arXiv:1105.1896. doi:10.1214/10-AOS831.
  16. ^ Tribble, Seth D. (2007). Markov chain Monte Carlo algorithms using completely uniformly distributed driving sequences (Diss.). Stanford University. ProQuest 304808879.
  17. ^ L'Ecuyer, P.; Lécot, C.; Tuffin, B. (2008). "A Randomized Quasi-Monte Carlo Simulation Method for Markov Chains" (PDF). Operations Research. 56 (4): 958–975. doi:10.1287/opre.1080.0556.
  18. ^ L'Ecuyer, P.; Munger, D.; Lécot, C.; Tuffin, B. (2018). "Sorting Methods and Convergence Rates for Array-RQMC: Some Empirical Comparisons". Mathematics and Computers in Simulation. 143: 191–201. doi:10.1016/j.matcom.2016.07.010.
  19. ^ Kasim, M.F.; Bott, A.F.A.; Tzeferacos, P.; Lamb, D.Q.; Gregori, G.; Vinko, S.M. (September 2019). "Retrieving fields from proton radiography without source profiles". Physical Review E. 100 (3): 033208. arXiv:1905.12934. Bibcode:2019PhRvE.100c3208K. doi:10.1103/PhysRevE.100.033208. PMID 31639953. S2CID 170078861.
  20. ^ Gupta, Ankur; Rawlings, James B. (April 2014). "Comparison of Parameter Estimation Methods in Stochastic Chemical Kinetic Models: Examples in Systems Biology". AIChE Journal. 60 (4): 1253–1268. Bibcode:2014AIChE..60.1253G. doi:10.1002/aic.14409. PMC 4946376. PMID 27429455.
  21. ^ See Gill 2008.
  22. ^ See Robert & Casella 2004.
  23. ^ Banerjee, Sudipto; Carlin, Bradley P.; Gelfand, Alan P. (2014-09-12). Hierarchical Modeling and Analysis for Spatial Data (Second ed.). CRC Press. p. xix. ISBN 978-1-4398-1917-3.
  24. ^ "Generative Modeling by Estimating Gradients of the Data Distribution | Yang Song". yang-song.net. Retrieved 2025-04-24.
  25. ^ a b Gelman, A.; Rubin, D.B. (1992). "Inference from iterative simulation using multiple sequences (with discussion)" (PDF). Statistical Science. 7 (4): 457–511. Bibcode:1992StaSc...7..457G. doi:10.1214/ss/1177011136.
  26. ^ Cowles, M.K.; Carlin, B.P. (1996). "Markov chain Monte Carlo convergence diagnostics: a comparative review". Journal of the American Statistical Association. 91 (434): 883–904. CiteSeerX 10.1.1.53.3445. doi:10.1080/01621459.1996.10476956.
  27. ^ Foreman-Mackey, Daniel; Hogg, David W.; Lang, Dustin; Goodman, Jonathan (2013-11-25). "emcee: The MCMC Hammer". Publications of the Astronomical Society of the Pacific. 125 (925): 306–312. arXiv:1202.3665. Bibcode:2013PASP..125..306F. doi:10.1086/670067. S2CID 88518555.
  28. ^ Phan, Du; Pradhan, Neeraj; Jankowiak, Martin (2019-12-24). "Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro". arXiv:1912.11554 [stat.ML].

Sources

[edit]

Further reading

[edit]