Hypothesis tests may be performed on contingency tables in order to decide whether or not effects are present. Effects in a contingency table are defined as relationships between the row and column variables; that is, are the levels of the row variable diferentially distributed over levels of the column variables. Significance in this hypothesis test means that interpretation of the cell frequencies is warranted. Non-significance means that any differences in cell frequencies could be explained by chance. Hypothesis tests on contingency tables are based on a statistic called Chi-square. In this chapter contingency tables will first be reviewed, followed by a discussion of the Chi-squared statistic. The sampling distribution of the Chi-squared statistic will then be presented, preceded by a discussion of the hypothesis test. A complete computational example will conclude the chapter. REVIEW OF CONTINGENCY TABLES

Frequency tables of two variables presented simultaneously are called contingency tables. Contingency tables are constructed by listing all the levels of one variable as rows in a table and the levels of the other variables as columns, then finding the joint or cell frequency for each cell. The cell frequencies are then summed across both rows and columns. The sums are placed in the margins, the values of which are called marginal frequencies. The lower right hand corner value contains the sum of either the row or column marginal frequencies, which both must be equal to N. For example, suppose that a researcher studied the relationship between having the AIDS Syndrome and sexual preference of individuals. The study resulted in the following data for thirty male subjects: AIDS

NY| Y| N| N| N| Y| N| N| N| Y| N| N| N| Y| N| N| N| N| N| N| N| Y| N| Y| Y| N| Y| N| Y| N| M| B| F| F| B| F| F| F| M| F| F| F| F| B| F| F| B| F| M| F| F| M| F| B| M| F| M| F| M| F| SEXPREF

with Y = "yes" and N = "no" for AIDS and F = "female", M = "male" and B = "both" for SEXPREF. The data file, with coding AIDS (1="Yes" and 2="No") and SEXPREF (1="Males", 2="Females, and 3="Both"), would appear as follows:

A contingency table and chi-square hypothesis test of independence could be generated using the following commands:

The resulting output tables are presented below:

The fact that the Pearson chi-square value under "Asymp. Sig" is 0.022 and less than .05 indicates that the rows and columns of the contingency are dependent. In general this means that it is worthwhile to interpret the cells in the contingency table. In this particular case it means that the AIDS Syndrome is not distributed similarly across the different levels of sexual preference. In other words, males who prefer other males or both males and females are more likely to have the syndrome than males who prefer females. HYPOTHESIS TESTING WITH CONTINGENCY TABLES

The procedure used to test the significance of contingency tables is similar to all other hypothesis tests. That is, a statistic is computed and then compared to a model of what the world would look like if the experiment was repeated an infinite number of times when there were no effects. In this case the statistic computed is called the chi-square statistic. This statistic will be discussed first, followed by a discussion of its theoretical distribution. Finding critical values of chi-square and its interpretation will conclude the chapter. COMPUTATION OF THE CHI-SQUARED STATISTIC

The first step in computing the Chi-squared statistic is the computation of the contingency table. The preceding table is reproduced below:

The next step in computing the Chi-squared statistic is the computation of the expected cell frequency for each cell. This is accomplished by multiplying the marginal frequencies for the row and column (row and column totals) of the desired cell and then dividing by the total number...