Dale Berger, Claremont Graduate University http://wise.cgu.edu
The purpose of this paper is to explain the logic and vocabulary of one-way analysis of variance (ANOVA). The null hypothesis tested by one-way ANOVA is that two or more population means are equal. The question is whether (H0) the population means may equal for all groups and that the observed differences in sample means are due to random sampling variation, or (Ha) the observed differences between sample means are due to actual differences in the population means.
The logic used in ANOVA to compare means of multiple groups is similar to that used with the t-test to compare means of two independent groups. When one-way ANOVA is applied to the special case of two groups, one-way ANOVA gives identical results as the t-test.
Not surprisingly, the assumptions needed for the t-test are also needed for ANOVA. We need to assume: 1)random, independent sampling from the k populations; 2)normal population distributions;
3)equal variances within the k populations.
Assumption 1 is crucial for any inferential statistic. As with the t-test, Assumptions 2 and 3 can be relaxed when large samples are used, and Assumption 3 can be relaxed when the sample sizes are roughly the same for each group even for small samples. (If there are extreme outliers or errors in the data, we need to deal with them first.) As a first step, we will review the t-test for two independent groups, to prepare for an extension to ANOVA.
Review of the t-test for independent groups
Let us start with a small example. Suppose we wish to compare two training programs in terms of performance scores for people who have completed the training course. The table below shows the scores for six randomly selected graduates from each of two training programs. These (artificially) small samples show somewhat lower scores from the first program than from the second program. But, can these fluctuations be attributed to chance in the sampling process or is this compelling evidence of a real difference in the populations? The t-test for independent groups is designed to address just this question by testing the null hypothesis H0: (1 = (2. We will conduct a standard t-test for two independent groups, but will develop the logic in a way that can be extended easily to more than two groups.
Program 1 Program 2
Mean[pic] 97 [pic]105
Variance s12 = 20 s22 = 16
The mean of all 12 scores = Grand mean = [pic] 101
The first step is to check the data to make sure that the raw data are correctly assembled and that assumptions have not been violated in a way that makes the test inappropriate. In our example, a plot of the data shows that the sample distributions have roughly the same shape, and neither sample has extreme scores or extreme skew. The sample sizes are equal, so equality of population variances is of little concern. Note that in practice you would usually have much larger samples.
We assume that the variance is the same within the two populations (Assumption 3). An unbiased estimate of this common population variance can be calculated separately from each sample. The numerator of the variance formula is the sum of squared deviations around the sample mean, or simply the sum of squares for sample j (abbreviated as SSj). The denominator is the degrees of freedom for the population variance estimate from sample j (abbreviated as dfj).
Unbiased estimateof (j2 =[pic] [Formula 1]
For the first sample, SS1 = (102-97)2 + ... + (101-97)2 = 100, and for the second sample, SS2 = 80. This leads to [pic] = 100/5 = 20, and [pic] = 80/5 = 16.
To pool two or more sample estimates of...