User:Talgalili/sandbox/Duncan's new multiple range test
In statistics, Duncan's new multiple range test (MRT) is a multiple comparison procedure developed by David B. Duncan in 1955. Duncan's MRT belongs to the general class of multiple comparison procedures that use the studentized range statistic qr to compare sets of means.
David B. Duncan developed this test as a modification of the Student–Newman–Keuls method that would have greater power. Duncan's MRT is especially protective against [[Type I and type II errors|false negative (Type II) error]] at the expense of having a greater risk of making false positive (Type I) errors. Duncan's test is commonly used in agronomy and other agricultural research.
The result of the test is a set of subsets of means, where in each subset means have been found not to be significantly different from one another.
Definition
Assumptions: 1.A sample of observed means , which have been drawn independently from n normal populations with "true" means, respectively.
2.A common standard error . This standard error is unknown, but there is available the usual estimate , which is independent of the observed means and is based on a number of degrees of freedom, denoted by . (More precisely, , has the property that is distributed as with n2 degrees of freedom, independently of sample means).
The exact definition of the test is: The difference between any two means in a set of n means is significant provided the range of each and every subset which contains the given means is significant according to an level range test where , and is the number of means in the subset concerned.
The Procedure
The procedure consists of a series of pairwise comparisons between means. Each comparison is performed at a significance level , defined by the number of means separating the two means compared ( for seperating means). The test are performed sequentially, where the result of a test determines which test is performed next.
The tests are performed in the following order: the largest minus the smallest, the largest minus the second smallest, up to the largest minus the second largest; then the second largest minus the smallest, the second largest minus the second smallest, and so on, finishing with the second smallest minus the smallest.
With only one exception, given below, each difference is significant if it exceeds the corresponding shortest significant range; otherwise it is not significant. Where the shortest significant range is the significant studentized range, multiplied by the standard error. The shortest significant range will be designated as , where is the number means in the subset. The sole exception to this rule is that no difference between two means can be declared significant if the two means concerned are both contained in a subset of the means which has a non-significant range.
An algorithm for performing the test is as follows:
1.Rank the sample means, largest to smallest. 2. For each sample mean, largest to smallest, do the following: 2.1 for each sample mean, (denoted ) , for smallest up to . 2.1.1 compare to critical value , 2.1.2 if does not exceed the critical value, the subset is declared not siginificantlly different: 2.1.2.1 Go to next iteration of loop 2. 2.1.3 Otherwise, keep going with loop 2.1
Critical Values
Duncan's Multiple range test makes use of the studentized range distribution in order to determine critical values for comparisons between means. Note that different comparisons between means may differ by their significance levels- since the significance level is subject to the size of the subset of means in question.
Let us denote as the qunatile of the studentized range distribution, with p observations , and degrees of freedom for the second sample (see studentized range for more information).
Let us denote as the standardized critical value, given by the rule:
If p=2
Else
The shortest critical range, (the actual critical value of the test) is computed as : . For ->∞, a tabulation exists for an exact value of Q (see link). A word of caution is needed here: notions for Q and R are not same throughout literature, where Q is sometimes denoted as the shortest significant interval, and R as the significant quantile for studentized range distribution (the 1955 article uses both notations in different parts).
Numeric Example
Let us look at the example of 5 treatment means:
Treatments | T1 | T5 | T2 | T3 | T4 |
---|---|---|---|---|---|
Treatment Means | 9.8 | 10.8 | 15.4 | 17.8 | 21.6 |
Rank | 1 | 2 | 3 | 4 | 5 |
With a standard error of , and (degrees of freedom for estimating the standard error).
Using a known tabulation for Q , one reaches the values of :
Now we may obtain the values of the shortest significant range , by the formula:
Reaching:
Then, the observed differences between means are tested, beginning with the largest versus smallest, which would be compared with the least significant range Next, the difference of the largest and the second smallest is computed and compared with the least significant difference .
If an observed difference is greater than the corresponding shortest significant range, then we concolude that the pair of means in question is significantly different.
If an observed difference is smaller than the corresponding shortest significant range, all differences sharing the the same upper mean are considered insignificant, in order to prevent contradictions (differences sharing the same upper mean are shorter by construction).
For our case, the comparison will yield:
We see that there are significant differences between all pairs of treatments except (T3,T2) and (T5,T1). A graph underlining those means that are not significantly different is shown below:
T1 T5 T2 T3 T4
Protection and Significance Levels based on Degrees of Freedom
The new multiple range test proposed by Duncan makes use of special protection levels based upon degrees of freedom. Let be the protection level for testing the significance of a difference between two means; that is, the probability that a significant difference between two mean will not be found if the population means are equal. Duncan reasons that one has p-1 degrees of freedom for testing p ranked mean, and hence one may conduct p-1 independent tests, each with protection level . Hence, the joint protection level is:
where
that is, the probability that one finds no significant differences in making p-1 independent tests, each at protection level , is , under the hypothesis that all p population means are equal. In general: the difference between any two means in a set of n means is significant provided the range of each and every subset ,which contains the given means, is significant according to an –level range test, where p is the number of means in the subset concerned.
For , the protection level can be tabulated for various value of r as follows:
Protection level | probability of falsely rejecting | |
---|---|---|
p=2 | 0.95 | 0.05 |
p=3 | 0.903 | 0.097 |
p=4 | 0.857 | 0.143 |
p=5 | 0.815 | 0.185 |
p=6 | 0.774 | 0.226 |
p=7 | 0.735 | 0.265 |
Note that although this procedure makes use of the Studentised range, his error rate is neither on an experiment-wise basis (as with Tukey's) nor on a per- comparisons basis. Duncan's multiple range test does not control the FWER. See Criticism Section for further details.