Analysis of Variance (ANOVA)

1

Recall, when we wanted to compare two population means, we used the 2-sample t procedures . Now let’s expand this to compare k ≥ 3 population means. As with the t-test, we can graphically get an idea of what is going on by looking at side-by-side boxplots. (See Example 12.3, p. 748, along with Figure 12.3, p. 749.)

1 Basic ANOVA concepts

1.1 The Setting

Generally, we are considering a quantitative response variable as it relates to one or more explanatory variables, usually categorical. Questions which ﬁt this setting: (i) Which academic department in the sciences gives out the lowest average grades? (Explanatory variable: department; Response variable: student GPA’s for individual courses) (ii) Which kind of promotional campaign leads to greatest store income at Christmas time? (Explanatory variable: promotion type; Response variable: daily store income) (iii) How do the type of career and marital status of a person relate to the total cost in annual claims she/he is likely to make on her health insurance. (Explanatory variables: career and marital status; Response variable: health insurance payouts) Each value of the explanatory variable (or value-pair, if there is more than one explanatory variable) represents a population or group. In the Physicians’ Health Study of Example 3.3, p. 238, there are two factors (explanatory variables): aspirin (values are “taking it” or “not taking it”) and beta carotene (values again are “taking it” or “not taking it”), and this divides the subjects into four groups corresponding to the four cells of Figure 3.1 (p. 239). Had the response variable for this study been quantitative—like systolic blood pressure level—rather than categorical, it would have been an appropriate scenario in which to apply (2-way) ANOVA.

1.2 Hypotheses of ANOVA

These are always the same. H0 : The (population) means of all groups under consideration are equal. Ha : The (pop.) means are not all equal. (Note: This is different than saying “they

are all unequal ”!)

1.3 Basic Idea of ANOVA

Analysis of variance is a perfectly descriptive name of what is actually done to analyze sample data acquired to answer problems such as those described in Section 1.1. Take a look at Figures 12.2(a) and 12.2(b) (p. 746) in your text. Side-by-side boxplots like these in both ﬁgures reveal differences between samples taken from three populations. However, variations like those depicted in 12.2(a) are much less convincing that the population means for the three populations are different than if the variations are as in 12.2(b). The reason is because the ratio of variation between groups to variation within groups is much smaller for 12.2(a) than it is for 12.2(b).

Math 143 – ANOVA

2

1.4 Assumptions of ANOVA

Like so many of our inference procedures, ANOVA has some underlying assumptions which should be in place in order to make the results of calculations completely trustworthy. They include: (i) Subjects are chosen via a simple random sample. (ii) Within each group/population, the response variable is normally distributed. (iii) While the population means may be different from one group to the next, the population standard deviation is the same for all groups. Fortunately, ANOVA is somewhat robust (i.e., results remain fairly trustworthy despite mild violations of these assumptions). Assumptions (ii) and (iii) are close enough to being true if, after gathering SRS samples from each group, you: (ii) look at normal quantile plots for each group and, in each case, see that the data points fall close to a line. (iii) compute the standard deviations for each group sample, and see that the ratio of the largest to the smallest group sample s.d. is no more than two.

2 One-Way ANOVA

When there is just one explanatory variable, we refer to the analysis of variance as one-way ANOV A.

2.1 Notation

Here is a key to symbols you may see as you read through this...