Individual Paper #3: Non parametric and Chi-square distribution

Brief Summary:
I worked for a logistic company. My major responsibility was in charge of the storage and transportation of parts of cars between two areas, which are about 1400 miles apart. One of my jobs is collecting the goods from suppliers and arranging the trucks to deliver them. There are five truck drivers, and each of them is assigned to deliver on each weekday throughout a whole year. Before the delivery, we will check the quality of the goods and Make sure that there are no damaged goods. When arriving at the destination, the staff will check the goods again and record the damaged goods that occurred in transit. At the end of every month, we will pay for the compensation according to the number of the defective goods. In order to reduce the number of the damaged goods during the delivery, I want to identify the reasons why they are damaged. In this study, I want to find out that whether some drivers are more prone to make the goods damaged during their delivery.

Variable to be measured:
Two variables are to be measured. The first variable is just the five truck drivers, and the second one is the quality of the goods after the delivery.

Determination of Population:
Population in this case is defined as the all goods delivered from Tianjin area to Guangzhou area.

Statistical method:
To analyze relationship between the two variables above which are both nominal in terms of data type, I decide to use Chi-squared test of a contingency table.

Sample Selection:
The information about delivery is recorded in our computer system, including the delivery date, name of the driver, the number of damaged goods and so on. I take out the data about 52 weeks during the previous year and record them into the following table:

...Chi-square requires that you use numerical values, not percentages or ratios.
Then calculate 2 using this formula, as shown in Table B.1. Note that we get a value of 2.668 for 2. But what does this number mean? Here's how to interpret the 2 value:
1. Determine degrees of freedom (df). Degrees of freedom can be calculated as the number of categories in the problem minus 1. In our example, there are two categories (green and yellow); therefore, there is I degree of freedom.
2. Determine a relative standard to serve as the basis for accepting or rejecting the hypothesis. The relative standard commonly used in biological research is p >0.05. The p value is the probability that the deviation of the observed from that expected is due to chance alone (no other forces acting). In this case, using p >0.05, you would expect any deviation to be due to chance alone 5% of the time or less.
3. Refer to a chi-squaredistribution table (Table B.2). Using the appropriate degrees of 'freedom, locate the value closest to your calculated chi-square in the table. Determine the closestp (probability) value associated with your chi-square and degrees of freedom. In this case (2=2.668), the p value is about 0.10, which means that there is a 10% probability that any deviation from expected results is due to chance only. Based on our standard p > 0.05, this is...

...Chisquare test for independence of two attributes. Suppose N observations are considered and classified according two characteristics say A and B. We may be interested to test whether the two characteristics are independent. In such a case, we can use Chisquare test for independence of two attributes.
The example considered above testing for independence of success in the English test vis a vis immigrant status is a case fit for analysis using this test.
This lesson explains how to conduct a chi-square test for independence. The test is applied when you have two categorical variables from a single population. It is used to determine whether there is a significant association between the two variables.
For example, in an election survey, voters might be classified by gender (male or female) and voting preference (Democrat, Republican, or Independent). We could use a chi-square test for independence to determine whether gender is related to voting preference. The sample problem at the end of the lesson considers this example.
When to Use Chi-Square Test for Independence
The test procedure described in this lesson is appropriate when the following conditions are met:
* The sampling method is simple random sampling.
* Each population is at least 10 times as large as its respective sample.
* The variables...

...CHI-SQUARE TEST (χ²):
Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the actual observed number was 8 males, then you might want to know about the "goodness to fit" between the observed and expected. Were the deviations (differences between observed and expected) the result of chance, or were they due to other factors. How much deviation can occur before you, the investigator, must conclude that something other than chance is at work, causing the observed to differ from the expected. The chi-square test is always testing what scientists call the null hypothesis, which states that there is no significant difference between the expected and observed result.
The formula for calculating chi-square (χ²) is:
2= (o-e) ²/e
That is, chi-square is the sum of the squared difference between observed (o) and the expected (e) data (or the deviation, d), divided by the expected data in all possible categories.
INTERPRETATION OF CHI-SQUARE TEST
1. Determine degrees of freedom (DF). Degrees of freedom can be calculated as the number of categories in the problem minus 1.
2. Determine a relative standard to serve...

...2.3. The Chi-SquareDistribution
One of the most important special cases of the gamma distribution is the chi-squaredistribution because the sum of the squares of independent normal random variables with mean zero and standard deviation one has a chi-squaredistribution. This section collects some basic properties ofchi-square random variables, all of which are well known; see Hogg and Tanis [6].
A random variable X has a chi-squaredistribution with n degrees of freedom if it is a gamma random variable with parameters m = n/2 and = 2, i.e X ~ (n/2,2). Therefore, its probability density function (pdf) has the form
(1) f(t) = f(t; n) =
In this case we shall say X is a chi-square random variable with n degrees of freedom and write X ~ (n). Usually n is assumed to be an integer, but we only assume n > 0.
Proposition 1. If X has a gamma distribution with parameters m and then 2X/ has a chi-squaredistribution with 2m degrees of freedom.
Proof. By Proposition 5 in section 2.2 the random variable X has a gamma distribution with parameters m and 2, i.e X ~ (m,2) = ((2m)/2,2). The proposition follows from this. ...

...Chi-Square Test
Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the actual observed number was 8 males, then you might want to know about the "goodness to fit" between the observed and expected. Were the deviations (differences between observed and expected) the result of chance, or were they due to other factors. How much deviation can occur before you, the investigator, must conclude that something other than chance is at work, causing the observed to differ from the expected. The chi-square test is always testing what scientists call the null hypothesis, which states that there is no significant difference between the expected and observed result.
The formula for calculating chi-square ( [pic]2) is:
[pic]2= [pic](o-e)2/e
That is, chi-square is the sum of the squared difference between observed (o) and the expected (e) data (or the deviation, d), divided by the expected data in all possible categories.
For example, suppose that a cross between two pea plants yields a population of 880 plants, 639 with green seeds and 241 with yellow seeds. You are asked to propose the genotypes of the parents. Your hypothesis is that the allele for...

...A chi-squared test, also referred to as chi-square test or χw² test, is any statistical hypothesis test in which the sampling distribution of the test statistic is a chi-squared distribution when the null hypothesis is true. Also considered a chi-squared test is a test in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-squared distribution as closely as desired by making the sample size large enough.
Some examples of chi-squared tests where the chi-squared distribution is only approximately valid:
Pearson's chi-squared test, also known as the chi-squared goodness-of-fit test or chi-squared test for independence. When the chi-squared test is mentioned without any modifiers or without other precluding context, this test is usually meant (for an exact test used in place of χ², see Fisher's exact test).
Yates's correction for continuity, also known as Yates' chi-squared test.
Cochran–Mantel–Haenszel chi-squared test.
McNemar's test, used in certain 2 × 2 tables with pairing
Tukey's test of additivity
The portmanteau test in time-series analysis, testing for the presence of autocorrelation...

...Independent). Results are shown below.
| Voting Preferences |
| Republican | Democrat | Independent | Row total |
Male | 200 | 150 | 50 | 400 |
Female | 250 | 300 | 50 | 600 |
Column total | 450 | 450 | 100 | 1000 |
a) If you conduct a chi-square test of independence, what is the expected frequency count of male Independents?
b) If you conduct a chi-square test of independence, what is the expected frequency count of female Democrats?
c) If you conduct a chi-square test of independence, what is the observed count of female Independents?
d) If you conduct a chi-square test of independence, what is the expected frequency count of male Republicans?
e) If you conduct a chi-square test of independence, what is the observed count of male Independents?
f) If you conduct a chi-square test of independence, what is the expected frequency count of female Republicans?
g) The table represents a
(A) 2 by 3 table
(B) 3 by 3 table
(C) 4 by 4 table
(D) 4 by 3 table
h) The null hypothesis is
(A) Gender and voting preferences are independent
(B) Gender and voting preferences are not independent
i) The degrees of freedom is ___________________
j) The chi-square statistic by hand is...

...BUSINESS ADMINISTRATION ------o0o------
CHAPTER 14: CHI-SQUARE TESTING
STATISTICS FOR BUSINESS TAs: Vo Vuong Van Anh, Le Phuoc Thien Thanh, and Le Nhat Ho December 21, 2013
TABLE OF CONTENTS
• PART I: CHI-SQUARE TESTING FOR GOODNESS-OF-FIT. • PART II: CHI-SQUARE TESTING FOR NORMAL DISTRIBUTION. • PART III: CHI-SQUARE TESTING FOR INDEPENDENCE.
December 2013
Powered by Vo Vuong Van Anh
2
1
12/21/2013
Hypothesis Testing Procedure for Chi-Square Testing
5 Steps to Perform an Chi-Square Testing STEP 01 State the null and alternative hypotheses ( STEP 02
) Determine the expected counts (frequencies of occurrence of certain events expected under the null hypothesis) and observed counts of data points falling in the different cells. Compute the test statistic value (based on the difference between the observed and the expected; Hint: Establishing a table) Find the critical value at the predetermined significance level. Draw conclusion by comparing the test statistic value and the critical value.
Powered by Vo Vuong Van Anh 3
STEP 03 STEP 04 STEP 05
December 2013
PART I: CHI-SQUARE TESTING FOR GOOODNESS-OF-FIT
• A goodness-of-fit chi-square test is a statistical test of how well our data support an assumption about the...