Chi square test for independence of two attributes. Suppose N observations are considered and classified according two characteristics say A and B. We may be interested to test whether the two characteristics are independent. In such a case, we can use Chi square test for independence of two attributes. The example considered above testing for independence of success in the English test vis a vis immigrant status is a case fit for analysis using this test.

This lesson explains how to conduct a chi-square test for independence. The test is applied when you have two categorical variables from a single population. It is used to determine whether there is a significant association between the two variables. For example, in an election survey, voters might be classified by gender (male or female) and voting preference (Democrat, Republican, or Independent). We could use a chi-square test for independence to determine whether gender is related to voting preference. The sample problem at the end of the lesson considers this example. When to Use Chi-Square Test for Independence

The test procedure described in this lesson is appropriate when the following conditions are met: * The sampling method is simple random sampling.
* Each population is at least 10 times as large as its respective sample. * The variables under study are each categorical.
* If sample data are displayed in a contingency table, the expected frequency count for each cell of the table is at least 5. This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. State the Hypotheses

Suppose that Variable A has r levels, and Variable B has c levels. The null hypothesis states that knowing the level of Variable A does not help you predict the level of Variable B. That is, the variables are independent. H0: Variable A and Variable B are independent. Ha: Variable A and Variable B are not...

...CHI-SQUARE TEST (χ²):
Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the actual observed number was 8 males, then you might want to know about the "goodness to fit" between the observed and expected. Were the deviations (differences between observed and expected) the result of chance, or were they due to other factors. How much deviation can occur before you, the investigator, must conclude that something other than chance is at work, causing the observed to differ from the expected. The chi-square test is always testing what scientists call the null hypothesis, which states that there is no significant difference between the expected and observed result.
The formula for calculating chi-square (χ²) is:
2= (o-e) ²/e
That is, chi-square is the sum of the squared difference between observed (o) and the expected (e) data (or the deviation, d), divided by the expected data in all possible categories.
INTERPRETATION OF CHI-SQUARE TEST
1. Determine degrees of freedom (DF). Degrees of freedom can be calculated as the number of categories in the problem minus 1.
2. Determine a relative standard to serve...

...Chi-square requires that you use numerical values, not percentages or ratios.
Then calculate 2 using this formula, as shown in Table B.1. Note that we get a value of 2.668 for 2. But what does this number mean? Here's how to interpret the 2 value:
1. Determine degrees of freedom (df). Degrees of freedom can be calculated as the number of categories in the problem minus 1. In our example, there are two categories (green and yellow); therefore, there is I degree of freedom.
2. Determine a relative standard to serve as the basis for accepting or rejecting the hypothesis. The relative standard commonly used in biological research is p >0.05. The p value is the probability that the deviation of the observed from that expected is due to chance alone (no other forces acting). In this case, using p >0.05, you would expect any deviation to be due to chance alone 5% of the time or less.
3. Refer to a chi-square distribution table (Table B.2). Using the appropriate degrees of 'freedom, locate the value closest to your calculated chi-square in the table. Determine the closestp (probability) value associated with your chi-square and degrees of freedom. In this case (2=2.668), the p value is about 0.10, which means that there is a 10% probability that any deviation from expected results is due to chance only. Based on our standard p > 0.05, this is within the...

...Independent). Results are shown below.
| Voting Preferences |
| Republican | Democrat | Independent | Row total |
Male | 200 | 150 | 50 | 400 |
Female | 250 | 300 | 50 | 600 |
Column total | 450 | 450 | 100 | 1000 |
a) If you conduct a chi-square test of independence, what is the expected frequency count of male Independents?
b) If you conduct a chi-square test of independence, what is the expected frequency count of female Democrats?
c) If you conduct a chi-square test of independence, what is the observed count of female Independents?
d) If you conduct a chi-square test of independence, what is the expected frequency count of male Republicans?
e) If you conduct a chi-square test of independence, what is the observed count of male Independents?
f) If you conduct a chi-square test of independence, what is the expected frequency count of female Republicans?
g) The table represents a
(A) 2 by 3 table
(B) 3 by 3 table
(C) 4 by 4 table
(D) 4 by 3 table
h) The null hypothesis is
(A) Gender and voting preferences are independent
(B) Gender and voting preferences are not independent
i) The degrees of freedom is ___________________
j) The chi-square statistic by hand is...

...Chi-Square Test
Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the actual observed number was 8 males, then you might want to know about the "goodness to fit" between the observed and expected. Were the deviations (differences between observed and expected) the result of chance, or were they due to other factors. How much deviation can occur before you, the investigator, must conclude that something other than chance is at work, causing the observed to differ from the expected. The chi-square test is always testing what scientists call the null hypothesis, which states that there is no significant difference between the expected and observed result.
The formula for calculating chi-square ( [pic]2) is:
[pic]2= [pic](o-e)2/e
That is, chi-square is the sum of the squared difference between observed (o) and the expected (e) data (or the deviation, d), divided by the expected data in all possible categories.
For example, suppose that a cross between two pea plants yields a population of 880 plants, 639 with green seeds and 241 with yellow seeds. You are asked to propose the genotypes of the parents. Your hypothesis is that the allele for...

...CHI-SQUARE AND TESTS OF CONTINGENCY TABLES
Hypothesis tests may be performed on contingency tables in order to decide whether or not effects are present. Effects in a contingency table are defined as relationships between the row and column variables; that is, are the levels of the row variable diferentially distributed over levels of the column variables. Significance in this hypothesis test means that interpretation of the cell frequencies is warranted. Non-significance means that any differences in cell frequencies could be explained by chance.
Hypothesis tests on contingency tables are based on a statistic called Chi-square. In this chapter contingency tables will first be reviewed, followed by a discussion of the Chi-squared statistic. The sampling distribution of the Chi-squared statistic will then be presented, preceded by a discussion of the hypothesis test. A complete computational example will conclude the chapter.
REVIEW OF CONTINGENCY TABLES
Frequency tables of two variables presented simultaneously are called contingency tables. Contingency tables are constructed by listing all the levels of one variable as rows in a table and the levels of the other variables as columns, then finding the joint or cell frequency for each cell. The cell frequencies are then summed across both rows and columns. The sums are placed in the margins, the values of which are called marginal...

...2.3. The Chi-Square Distribution
One of the most important special cases of the gamma distribution is the chi-square distribution because the sum of the squares of independent normal random variables with mean zero and standard deviation one has a chi-square distribution. This section collects some basic properties of chi-square random variables, all of which are well known; see Hogg and Tanis [6].
A random variable X has a chi-square distribution with n degrees of freedom if it is a gamma random variable with parameters m = n/2 and = 2, i.e X ~ (n/2,2). Therefore, its probability density function (pdf) has the form
(1) f(t) = f(t; n) =
In this case we shall say X is a chi-square random variable with n degrees of freedom and write X ~ (n). Usually n is assumed to be an integer, but we only assume n > 0.
Proposition 1. If X has a gamma distribution with parameters m and then 2X/ has a chi-square distribution with 2m degrees of freedom.
Proof. By Proposition 5 in section 2.2 the random variable X has a gamma distribution with parameters m and 2, i.e X ~ (m,2) = ((2m)/2,2). The proposition follows from this.
Proposition 2. If X has a chi-square distribution with n degrees of freedom, then the...

...Chi-square tests
1. INTRODUCTION
1.1 χ2 distribution and its properties
A chi-square (χ2) distribution is a set of density curves with each curve described by its degree of freedom (df). The distribution have the following properties:
- Area under the curve = 1
- All χ2 values are positive i.e. the curve begins from 0 (except for df=1) increases to a peak and decreases towards 0 as its asymptote
- The curve is skewed to the right, and as the degree of freedom increases, the distribution approaches that of a normal distribution
Fig. 1 Graph of χ2 distribution with differing degrees of freedom
Each χ2 value is computed by the formula:
χ2 = Σ (O-E)2
E
where O = observed counts from the sample Equation 1
and E= expected counts based on the hypothesized distribution
1.2 Types of χ2 tests and their purpose
For a single population, to determine if the observed distribution in the population conforms to a specific known distribution or a previously studied distribution, the χ2 test for goodness-of-fit can be used.
An example of this usage include: Mendel’s genetic model predicts that the phenotypic distribution of two phenotypes, each phenotype having a dominant and recessive allele, will follow the ratio of 9:3:3:1. A study done to confirm this makes use of χ2 test for goodness-of-fit to determine if the observed population fits into the theoretical model. We will discuss...

...THE CHI-SQUARE GOODNESS-OF-FIT TEST
The chi-square goodness-of-fit test is used to analyze probabilities of multinomial distribution trials along a single dimension. For example, if the variable being studied is economic class with three possible outcomes of lower income class, middle income class, and upper income class, the single dimension is economic class and the three possible outcomes are the three classes. On each trial, one and only one of the outcomes can occur. In other words, a family unit must be classified either as lower income class, middle income class, or upper income class and cannot be in more than one class. The chi-square goodness-of-fit test compares the theoretical, frequencies of categories from a population distribution to the observed, or actual, frequencies from a distribution to determine whether there is a difference between what was expected and what was observed. For example, airline industry officials might theorize that the ages of airline ticket purchasers are distributed in a particular way. To validate or reject this expected distribution, an actual sample of ticket purchaser ages can be gathered randomly, and the observed results can be compared to the expected results with the chi-square goodness-of-fit test. This test also can be used to determine whether the observed arrivals at teller windows at a bank are Poisson...