Multinomial test

In statistics, the multinomial test is the likelihood-ratio test of the null hypothesis that the parameters of a multinomial distribution equal specified values. It is used for categorical data; see Read and Cressie^[1].

We begin with a sample of $N$ items each of which has been observed to fall into one of $k$ categories. We can define $\mathbf {x} =(x_{1},x_{2},\dots ,x_{k})$ as the observed numbers of items in each cell. Hence $\textstyle \sum _{i=1}^{k}x_{i}=N$ .

Next, we define a vector of parameters $H_{0}:\mathbf {\pi } =(\pi _{1},\pi _{2},\dots ,\pi _{k})$ , where $\textstyle \sum _{i=1}^{k}\pi _{i}=1$ . These are the parameter values under the null hypothesis.

The exact probability of the observed configuration $\mathbf {x}$ under the null hypothesis is given by

\Pr(\mathbf {x_{0}} )=n!\prod _{i=1}^{k}{\frac {\pi _{i}^{x_{i}}}{x_{i}!}}.

Under the alternative hypothesis, each value $\pi _{i}$ is replaced by its maximum likelihood estimate $p_{i}=x_{i}/N$ and the exact probability of the observed configuration $\mathbf {x}$ under the alternative hypothesis is given by

\Pr(\mathbf {x_{A}} )=n!\prod _{i=1}^{k}{\frac {p_{i}^{x_{i}}}{x_{i}!}}.

The natural logarithm of the ratio between these two probabilities multiplied by $-2$ is then the likelihood ratio test statistic

-2\ln(LR)=\textstyle -2\sum _{i=1}^{k}x_{i}\ln(\pi _{i}/p_{i}).

If the null hypothesis is true, then as $N$ increases, the distribution of $-2\ln(LR)$ converges to that of chi-square with $k-1$ degrees of freedom. However it has long been known (eg Lawley 1956) that for finite sample sizes, the moments of $-2\ln(LR)$ are greater than those of chi-square, thus inflating the probability of type I errors (false positives). The difference between the moments of chi-square and those of the test statistic are a function of $N^{-1}$ . Williams (1976) showed that the the first moment can be matched as far as $N^{-2}$ if the test statistic is divided by a factor given by

q_{1}=1+{\frac {\sum _{i=1}^{k}\pi _{i}^{-1}-1}{6N(k-1)}}.

In the special case where the null hypothesis is that all the values $\pi _{i}$ are equal to $1/k$ (ie it stipulates a uniform distribution), this simplifies to

q_{1}=1+{\frac {k+1}{6N}}.

Subsequently, Smith et al (1981) derived a dividing factor which matches the first moment as far as $N^{-3}$ . For the case of equal values of $\pi _{i}$ , this factor is

q_{2}=1+{\frac {k+1}{6N}}+{\frac {k^{2}}{6N^{2}}}.

The null hypothesis can also be tested by using the chi-square approximation statistic

\chi ^{2}=\sum _{i=1}^{k}{(x_{i}-E_{i})^{2} \over E_{i}}

where $E_{i}=N\pi _{i}$ is the expected number of cases in category $i$ under the null hypothesis. This statistic also converges to a chi-square distribution with $k-1$ degrees of freedom when the null hypothesis is true but does so from below, as it were, rather than from above as $-2\ln(LR)$ does, so may be preferable to the uncorrected version of $-2\ln(LR)$ for small samples.

This statistics-related article is a stub. You can help Wikipedia by expanding it.

References

^ Read, T. R. C. and Cressie, N. A. C. (1988). Goodness-of-fit statistics for discrete multivariate data. New York: Springer-Verlag. ISBN 0-387-96682-X.

Lawley, D. N. (1956). "A General Method of Approximating to the Distribution of Likelihood Ratio Criteria". Biometrika. 43: 295–303.
Smith, P. J., Rae, D. S., Manderscheid, R. W. and Silbergeld, S. (1981). "Approximating the Moments and Distribution of the Likelihood Ratio Statistic for Multinomial Goodness of Fit". Journal of the American Statistical Association. 76: 737–740.{{cite journal}}: CS1 maint: multiple names: authors list (link)
Williams, D. A. (1976). "Improved Likelihood Ratio Tests for Complete Contingency Tables". Biometrika. 63: 33–37.

[1] Read, T. R. C. and Cressie, N. A. C. (1988). Goodness-of-fit statistics for discrete multivariate data. New York: Springer-Verlag. ISBN 0-387-96682-X.

[1]