Jump to content

Statistical hypothesis test

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Tedunning (talk | contribs) at 08:01, 20 May 2001. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

back to Statistical Theory -- Applied Statistics


Many researchers wish to test a statistical hypothesis with their data. There are several preparations we make before we observe the data.

  1. The hypothesis must be stated in mathematical/statistical terms that make it possible to calculate the probability of possible samples assuming the hypothesis is correct. For example, The mean response to treatment being tested is equal to the mean response to the placebo in the control group. Both response have the Normal Distribution with the unknown means and the same known Standard Deviation.
  1. A test Statistic must be chosen that will summarize the information in the sample that is relevant to the hypothesis. In the example given above, it might be the numerical difference between the two sample means, m1-m2.
  1. The distribution of the test statistic is used to calculate the probability sets of possible values (usually an interval or union of intervals). In this example, the difference between sample means would have a normal distribution with a standard deviation equal to the common standard deviation times the factor 1/sqrt(n1) + 1/sqrt(n2) where n1 and n2 are the sample sizes.
  1. Among all the sets of possible values, we must choose one that we think represents the most extreme evidence against the hypothesis. That is called the critical region of the test statistic. The probability of the test statistic falling in the critical region when the hypothesis is correct is called the alpha value (or size) of the test.


After the data is available, the test statistic is calculated and we determine whether it is inside the critical region.


If the test statistic is inside the critical region, then our conclusion is either

  1. The hypothesis is incorrect or
  1. An event of probability less than or equal to alpha has occurred.

The researcher has to choose between these logical alternatives.


If the test statistic is outside the critical region, the only conclusion is that

  • There is not enough evidence to reject the hypothesis.

This is not the same as evidence for the hypothesis. That we cannot obtain. Statistical research progesses by eliminating error, not by finding the truth.


Note: Statistics cannot "find the truth", but it can approximate it. The following argument for the maximum likelihood principle illustrates this -- TedDunning


If p(X) is the distribution of a random variable X that takes on values x from the real numbers and q(x) is some other distribution, then by the definition of a distribution, we know that

q(x) ≥ 0

and

Σ x q(x) = 1

and likewise for p.


If we take N repeated independent samples xi of X, then the expected value of the mean of log q(xi) is given by


E(Σ log q(xi) / N) = Σ p(xi) log q(xi) / N


But since log yy-1 for positive y, we have


Σ p(xi) log q(xi) - Σ p(xi) log p(xi) = Σ p(xi) log q(xi)/p(xi)
... ≤ Σ p(xi) [ 1- q(xi)/p(xi) ]
... ≤ Σ p(xi) - Σ q(xi)
... ≤ 1 - 1 = 0


Thus,


Σ p(xi) log q(xi) ≤ Σ p(xi) log p(xi)


and equality can only be achieved where q(x) = p(x).


This means that maximizing the expected value of the mean value of log q is the same as finding p. To the extent that the law of large numbers lets us approximate this expected value by the observed mean, maximizing this observed mean lets us approximate p.


Thus it can be said that statistical inference can let us approximate the truth.


Interestingly, the negative of this mean value of log q is the expected length of a compressed representation of the xi where q is the model used to do the compression. Thus we can also claim that ultimate compression = truth. This leads us off to Occam's Razor.