Rank Sum Test
The two-sample t test of the previous section was based on several conditions: independent samples, normality, and equal variances. When the conditions of normality and equal variances are not valia but the sample sizes are large, the ~.. .
Wilcoxon rank sum test
; . .
FIGURE 6.7 Skewed population distributions identical in shape but shifted 0.08
results using a r (or 1') test are approximately correct. There IS, however, an alternative test procedure that requires less stringent conditions. This procedure, -- the Wilcoxon rank sum test, IS discussed here. called The assumptions for this test are that we have independent random samples8 taken from two populations whose distributions are identical except that one distribution may be shifted to the right of the other distribution, as shown in1 Figure 6.7. T h e Wilcoxon rank sum test does not require that populations have( normal distributions. Thus, we have removed one of the three conditions that/ were required of the t-based procedures. The other conditions, equal variancesi and independence of the random samples, are still required for the Wilcoxon rank sum test. Because the two population distributions are assumed to be identical. under the null hypothesis, independent random samples from the two populations1 should be similar if the null hypothesis is true. Because we are now allowing thei population distributions to be nonnormal, the rank sum procedure must deal with1 the possibility of extreme observations in the data. One way to handle samples containing extreme values is t o replace each data value with its rank (from lowest to highest) in the combined sample-that is, the sample consisting of the data' from both populations. T h e smallest value in the combined sample is assigned the rank of 1 and the largest value is assigned the rank of N = n, + ni. The ranks are not affected by how far the smallest (largest) data value is from next smalles4 I (largest) data value. Thus, extreme values in data sets do not have a strong e f f e a I i on the rank sum statistic as they did in the 1-based procedures. ~~
T h e calculation of the rank sum statistic consists of the following steps: 1. List the data values for both samples from smallest to largest. 2 In the next column, assign the numbers 1 to N to the data values . 1 to the smallest value and N to the largest vaiue. These are the ran of the observations. 3. If there are ties-that is, duplicated values-in the combined data set the ranks for the observations in a tie are taken to be the average of the ranks for those observations. 4. Let T denote the sum of the ranks for the observations from population 1.
If the null hypothesis of identical population distributions is true, the n , ranks from population 1 are just a random sample from the iV integers 1, . . . , N. Thus, under the null hypothesis, the distribution of the sum of the ranks Tdepends only on the sample sizes, n , and n ~ and does not depend on the shape of the , population distributions. Under the null hypothesis, the sampling distribution of T has mean and variance given by
Intuitively, if T is much smaller (or larger) than py. we have evidence that the null hypothesis is false and in fact the population distributions are not equal. The rejection region for the rank sum test specifies the size of the difference between T and pr for the null hypothesis to be rejected. Because the distribution of T under the null hypothesis does not depend on the shape of the population distributions, Table 5 provides the critical values for the test regardless of the shape of the population distribution. The Wilcoxon rank sum test is summarized here.
W)koxon Rank Sum Test*
Ho: The two populations are identical. H.:...