Studentized range distribution

Studentized range distribution
	Probability density function
	Cumulative distribution function
Parameters	k > 1 — the number of groups; > 0 — degrees of freedom
Support	q ∈ (0; +∞)
PDF
CDF

In probability and statistics, studentized range distribution is the continuous probability distribution of the studentized range of an i.i.d. sample from a normally distributed population.

Suppose that we take a sample of size n from each of k populations with the same normal distribution N(μ, σ²) and suppose that ${\bar {y}}$ _min is the smallest of these sample means and ${\bar {y}}$ _max is the largest of these sample means, and suppose S² is the pooled sample variance from these samples. Then the following random variable has a Studentized range distribution.

q={\frac {{\overline {y}}_{\max }-{\overline {y}}_{\min }}{\left(S/{\sqrt {n\,}}\right)}}

Definition

Probability density function

Differentiating the cumulative distribution function with respect to q gives the probability density function.

f_{\text{R}}(q;k,\nu )={\frac {{\sqrt {2\pi \,}}\,k\,(k-1)\,\nu ^{\nu /2}}{\Gamma (\nu /2)\,2^{\left(\nu /2-1\right)}}}\int _{0}^{\infty }x^{\nu }\,\varphi ({\sqrt {\nu \,}}\,x)\,\left[\int _{-\infty }^{\infty }\varphi (u+q\,x)\,\varphi (u)\,\left[\Phi (u+q\,x)-\Phi (u)\right]^{k-2}\,\mathrm {d} u\right]\,\mathrm {d} x

Cumulative distribution function

The cumulative distribution function is given by ^[1]

F_{\text{R}}(q;k,\nu )={\frac {k\,\nu ^{\nu /2}}{\,\Gamma (\nu /2)\,2^{(\nu /2-1)}\,}}\int _{0}^{\infty }x^{\nu -1}e^{-\left(\nu \,x^{2}/2\right)}\left[\int _{-\infty }^{\infty }\varphi (u)\left[\Phi (u+q\,x)-\Phi (u)\right]^{k-1}\,\mathrm {d} u\right]\,\mathrm {d} x

Note that in the pdf formula in the prior section uses the equation

e^{-\left(\nu \,x^{2}/2\right)}={\sqrt {2\pi \,}}\,\varphi ({\sqrt {\nu \,}}\,x)

to replace the exponential term in front of the first square bracket.

Special cases

When the degrees of freedom approach infinity, the standard normal distribution can be used for the general equation above. If k is 2 or 3,^[2] the studentized range probability distribution function can be directly evaluated, where $\varphi (z)$ is the standard normal probability density function.

f_{\text{R}}(q;k=2)={\sqrt {2\,}}\,\varphi \left(\,q/{\sqrt {2\,}}\right)

f_{\text{R}}(q;k=3)=6{\sqrt {2\,}}\,\varphi \left(\,q/{\sqrt {2\,}}\right)\left[\Phi \left(q/{\sqrt {6\,}}\right)-{\tfrac {1}{2}}\right]

When the degrees of freedom approaches infinity the studentized range cumulative distribution can be calculated at all k using the standard normal distribution.

F_{\text{R}}(q;k)=k\,\int _{-\infty }^{\infty }\varphi (z)\,\left[\Phi (z+q)-\Phi (z)\right]^{k-1}\,\mathrm {d} z

Uses

Critical values of the studentized range distribution are used in Tukey's range test.

Derivation of the studentized range distribution function

The studentized range distribution function arises from re-scaling the sample range R by the sample standard deviation S, since tables of the studentized range are customarily given in units of standard deviations, with the variable q = R⁄S. The derivation begins with a perfectly general form of the range distribution function, which does not depend on the form of the distribution of the sample data. However, in order to obtain the distribution in terms of q, one must introduce S, assume normality, and then integrate over S in order to remove it.

General form

For any probability density function f_X, the range probability density f_R is:^[2]

f_{\text{R}}(r;k)=k\,(k-1)\int _{-\infty }^{\infty }f_{X}\left(t+{\tfrac {1}{2}}r\right)f_{X}\left(t-{\tfrac {1}{2}}r\right)\left(\int _{t-{\tfrac {1}{2}}r}^{t+{\tfrac {1}{2}}r}f_{X}(x)\,\mathrm {d} x\right)^{k-2}\,\mathrm {d} t

What this means is that we are adding up the probabilities that, given k draws from a distribution, two of them differ by r, and the remaining k − 2 draws all fall between the two extreme values. If we use u substitution where $u=t-{\tfrac {1}{2}}r$ is the low-end of the range, and define F_X as the cumulative distribution function of f_X, then the equation can be simplified:

f_{\text{R}}(r;k)=k\,(k-1)\int _{-\infty }^{\infty }f_{X}(u+r)\,f_{X}(u)\,\left[\,F_{X}(u+r)-F_{X}(u)\,\right]^{k-2}\,\mathrm {d} u

Notice that differentiating under the following integral gives

{\begin{aligned}&{\frac {\partial }{\partial r}}\left[k\,\int _{-\infty }^{\infty }f_{X}(u)\,\left[\,F_{X}(u+r)-F_{X}(u)\,\right]^{k-1}\,\mathrm {d} u\right]\\[5pt]={}&k(k-1)\int _{-\infty }^{\infty }f_{X}(u+r)\,f_{X}(u)\,\left[\,F_{X}(u+r)-F_{X}(u)\,\right]^{k-2}\,\mathrm {d} u\end{aligned}}

which is back to the integral above, so that confirms

F_{\text{R}}(r)=k\int _{-\infty }^{\infty }f_{X}(u)\left[\,F_{X}(u+r)-F_{X}(u)\,\right]^{k-1}\,\mathrm {d} u

because for any continuous cdf

{\frac {\partial F_{\text{R}}(r)}{\partial r}}=f_{\text{R}}(r)

Special form for normal data

The range distribution is most often used for confidence intervals around sample averages, which are asymptotically normally distributed by the central limit theorem.

In order to create the studentized range distribution for normal data, we first switch to φ and Φ for the standard normal distribution from f and F, and change the variable r to S q, where q is a fixed factor that re-scales u by scale factor s:

f_{\text{R}}(q;k)=sk(k-1)\int _{-\infty }^{\infty }\varphi (u+sq)\varphi (u)\,\left[\,\Phi (u+sq)-\Phi (u)\right]^{k-2}\,\mathrm {d} u

Choose the scale-setting factor S to be the sample standard deviation, so that q becomes the number of standard deviations wide that the range is. For normal data S is chi distributed^[a] and the distribution function f_S of the chi distribution is given by:

f_{S}(s;\nu )\,\mathrm {d} s={\begin{cases}{\dfrac {\nu ^{\nu /2}s^{\nu -1}e^{-\nu s^{2}/2}\,}{2^{\left(\nu /2-1\right)}\Gamma (\nu /2)}}\,\mathrm {d} s&{\text{for }}\,0<s<\infty ,\\[4pt]0&{\text{otherwise}}.\end{cases}}

Multiplying the distributions f_R and f_S and integrating to remove the dependence on the standard deviation s gives the studentized range distribution function for normal data:

f_{\text{R}}(q;k,\nu )={\frac {\nu ^{\nu /2}k(k-1)}{2^{\left(\nu /2-1\right)}\Gamma (\nu /2)}}\int _{0}^{\infty }s^{\nu }e^{-\nu s^{2}/2}\int _{-\infty }^{\infty }\varphi (u+sq)\,\varphi (u)\,\left[\,\Phi (u+sq)-\Phi (u)\right]^{k-2}\,\mathrm {d} u\,\mathrm {d} s

where

q is the width of the data range measured in standard deviations,

ν is the number of degrees of freedom for determining the sample standard deviation,^[b] and

k is the number of data points in the range.

The final equation for the pdf, above, comes from using

e^{-\nu \,S^{2}/2}={\sqrt {2\pi \,}}\,\varphi ({\sqrt {\nu \,}}\,S)\,

to replace the exponential expression in the outer integral.

Notes

^ Note well the absence of "squared": The text refers to the χ distribution, not the χ² distribution.
^ Generally, ν = n − 1.

References

^ Lund, R.E.; Lund, J.R. (1983). "Algorithm AS 190: Probabilities and upper quantiles for the studentized range". Journal of the Royal Statistical Society. 32 (2): 204–210. JSTOR 2347300.
^ ^a ^b McKay, A.T. (1933). "A note on the distribution of range in samples of n". Biometrika. 25 (3): 415–420. doi:10.2307/2332292. JSTOR 2332292.

Pearson, E.S.; Hartley, H.O. (1942). "The probability integral of the range in samples of N observations from a normal population". Biometrika. 32 (3): 301–310. doi:10.1093/biomet/32.3-4.309. JSTOR 2332134.

Hartley, H.O. (1942). "The range in random samples". Biometrika. 32 (3): 334–348. doi:10.2307/2332137. JSTOR 2332137.

Dunlap, W.P.; Powell, R.S.; Konnerth, T.K. (1977). "A FORTRAN IV function for calculating probabilities associated with the studentized range statistic". Behavior Research Methods & Instrumentation. 9 (4): 373–375. doi:10.3758/BF03202264.

[3] Note well the absence of "squared": The text refers to the χ distribution, not the χ² distribution.

[4] Generally, ν = n − 1.

[lund-1] Lund, R.E.; Lund, J.R. (1983). "Algorithm AS 190: Probabilities and upper quantiles for the studentized range". Journal of the Royal Statistical Society. 32 (2): 204–210. JSTOR 2347300.

[mckay-2] McKay, A.T. (1933). "A note on the distribution of range in samples of n". Biometrika. 25 (3): 415–420. doi:10.2307/2332292. JSTOR 2332292.

[1]

[2]

[a]

[b]