Talk:Metropolis–Hastings algorithm

Statistics C‑class Low‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics
C	This article has been rated as C-class on Wikipedia's content assessment scale.
Low	This article has been rated as Low-importance on the importance scale.

x prime or x_t?

Here are two conflicting comments on the case a>1, which I have cut and pasted for readability. --Rinconsoleao 15:12, 16 July 2007 (UTC)[reply]

There seems to be a typo in the last display. If a>1, we would accept the proposal, x', not x_t. --Youyifong 06:38, 28 July 2006 (UTC)[reply]

Hi, is there a mistake here? In the step-by-step instructions,

x^{t+1}=\left\{{\begin{matrix}x'(x(t)?)&{\mbox{if }}a>1\\x'{\mbox{ with probability }}a,&{\mbox{if }}a<1\end{matrix}}\right.

(Postdoc 02:30, 16 July 2007 (UTC))[reply]

The algorithm always accepts if a>1. That is, $x^{t+1}=x'$ when a>1. Notice that this is consistent with the statements about the uniform random variable $u$ in the previous paragraph. I have rewritten the "step-by-step instructions" to make this clearer. --Rinconsoleao 15:12, 16 July 2007 (UTC)[reply]

Please check the page again. The pages mis-typed x' as x_t, as Youyifong mentioned again. Also, should it be "a >= 1" ? (Postdoc 05:38, 30 July 2007 (UTC))[reply]

Thanks! --Rinconsoleao 15:44, 31 July 2007 (UTC)[reply]

Gibbs sampling

The article says that the Gibbs-sampler is a special case of the MH-algorithm. As far as I know it is a complementary MCMC-method which works quite differently. --Smeyen 01:07, 24 February 2007 (UTC)[reply]

In Gibbs sampler, its proposal distribution is the full conditional distribution of some part of parameter space conditional on the rest, which always resultes in a acceptance probability of 1. See [http:://www.stat.purdue.edu/~lebanon/notes/metropolis.pdf] (Postdoc 05:46, 30 July 2007 (UTC))[reply]

Definition and notations

The symbol U(0;1) is not defined. I guess it is the uniform distribution in the interval [0;1]. May be it is obvious for people used with the statistic wikipedia portal notation, it is not for the others.

The symbol $Q(x^{t};x')$ is defined by not this one $Q(x^{t}|x')$ .

Vincent - 12/04/09

Complexity

The Markov chain is started from a random initial value \displaystyle x^0 and the algorithm is run for many iterations until this initial state is "forgotten"

no bounds are known on the number of needed iterations, e.g. the mixing time of the Markov chain? Can we really claim in the introduction that this algorithm allow to compute something if we don't known how long it should be iterate? Levochik (talk) 07:43, 26 August 2009 (UTC)[reply]

Huh

This statement doesn't make any sense: "requiring only that a function dominating (being greater than or equal to) the density can be calculated at X". How about the function s(x) = P(x) + 1. That dominates P at x. Surely this isn't what you meant. —Preceding unsigned comment added by 64.134.176.249 (talk) 11:03, 22 December 2009 (UTC)[reply]

It is, but if you can compute P(x)+1 then surely P(x) should pose no challenge. 83.101.44.62 (talk) 18:22, 28 February 2010 (UTC)[reply]

The statement you refer to doesn't seem to actually apply to Metropolis-Hastings (it does to other samplers), and the article doesn't make use of this hypothetical dominating function. Someone anonymously changed 'proportional to' to 'dominating'. I will change it back Rjljr2 (talk) 15:30, 11 April 2010 (UTC)[reply]

Example

First of all, thank you for this article. It's the best I could find in the web about Monte Carlo simualtion. But unfortunately, I still don't get it. How can I compute a=a_1a_2 if I don't know P(x)? Perhaps additionally, you could include an example. Something where one can see the step-by-step instruction applied to a simple example. Just like a record of a view steps from a real simulation of some simple case. —Preceding unsigned comment added by Askamaster (talk • contribs) 10:30, 25 March 2010 (UTC)[reply]

You don't need P(x), you only need a function proportional to P(x), as the normalizations cancel in the ratio. It is quite often the case that you can easily determine a function proportional to P(x), but the integral to normalize it is very difficult to perform. This comes up quite often in Bayesian computations of the posterior distribution. Rjljr2 (talk) 15:49, 11 April 2010 (UTC)[reply]

Adapting the proposal density after the measurements have started?

Is it allowed to change $\displaystyle Q(x';x_{t})$ after the measurements have started if, for example, some mechanics in the implementation discovers that the acceptance rate is much too low, say only about 5 % or less, and decides that it has to do something about it? —Kri (talk) 00:58, 25 July 2011 (UTC)[reply]

Proposal density

Clearly the Markov chain doesn't converge for all proposal densities Q. Certainly Q > 0 is sufficient, but I don't believe it is necessary. What condition is required for the proposal density? — Preceding unsigned comment added by WuTheFWasThat (talk • contribs) 19:24, 16 May 2012 (UTC)[reply]

Overview

The section Overview tries to give a formal derivation of the Algorithm. The foregoing sections are so well written that it was very easy to understand for me, even though I am not an expert in the field. Bringing the section Overview to the same didadactic level would be an asset. The questions I would like to have answered are: How can I see that the equation is still redundant and what does the term redundant mean in this context? How does one come to the idea of defining $\displaystyle A(x\rightarrow x')$ as the specified minimum function?

Some other proposals for improvements: The meaning of the variable $\displaystyle x'$ seems to be different from the one in the section Intuition. Would it be an option to use another notation? Furthermore, the section Overview repeats the definition of the algorithm in the section Intuition. However, at the same time it lists the steps of skipping samples to mitigate correlations. I suggest to put the complete list of steps into the previous section. Then, a proper name for the section Overview would e.g. be Formal Derivation.

Sorry, I am new to wikipedia and do not want to just poke your text. Please let me know if you wish me to do some of the mentioned improvements. —Contribo (talk) 20:20, 3 December 2012 (UTC)[reply]

I agree with you, "overview" is not the proper name; maybe "formal derivation" as you suggested sounds better. Nevertheless, the intuition section does not present the modern view of metropolis-hastings algorithm in the sense that it does not uses a general proposal (it is always assumed symmetrical, which is not the general case. However, I think the intuition part is to understand the idea of the method; that is why it skips the fact that the proposal also has to be considered in the acceptance in the general case.

Redundant in the sense that there can be another choices for A which also fulfills the equation you mention (maybe under-determined). I don't recall my statistical physics lectures, but I know there is at least another choice of A which also fulfills it and yet, it does not use the "min".Jorgecarleitao (talk) 20:53, 3 December 2012 (UTC)[reply]

Thank you for your explanations, I did not expect to get an Answer so quickly. I bring these issues into discussion not just because I would like to understand the matter but also to motivate you to work towards improving this article together with me. In my role as the learner I offer to ask you questions and you as the author could modify the article, such as to answer these questions. Of course I offer to do modifications as well — as far as I am qualified. Does this sound attractive?

In this sense let me point out some further aspects. You write the proposal also has to be considered in the acceptance in the general case. I am not sure if I get you right. Isn't this considered in the section Step-by-step Instructions? This brings me to another point: the Step-by-step Instructions should contain the skipping of T states to make the resulting sequence uncorrelated.

I noticed some more points regarding the section Overview: What is the role of ergodicity in this explanation. The text says that it can be shown that ergodicity is a requirement for the process to converge asymptotically. A reference to the proof would be great. How is it ensured that the process is ergodic? Another Point: The markov matrices in the wiki-reference have finite dimensions. However, the markov matrix for the process under discussion is infinte dimensional, this difference should at least be mentioned. --Contribo (talk) 21:24, 3 December 2012 (UTC)[reply]

Easter egg link

The link which is labeled "in practice" and leads to Bayesian statistics should be rephrased so as to not surprise the reader, as per WP:EASTEREGG. There should probably be some brief remark explaining what Bayesian statistics are and how they relate, but I don't know how to write such a remark. — Preceding unsigned comment added by 132.65.153.93 (talk) 18:03, 12 November 2013 (UTC)[reply]