A note on basic statistics

Statistics is the practice or science of collecting and analyzing numerical data in large quantities. So there are two parts1. Collection of Data 2. Analysis of Data- understanding what the data says. Steps in Statistics To carry out any statistical operation, the following steps need to be followed, in the given order: 1. 2. 3. 4. 5. 6. Sampling Estimation Hypothesis Generation Testing Regression Prediction

Collection of Data: Sampling Sampling is the process of selecting some limited number of data from the entire possible population of such data. Selecting 10 apples from all the possible apples in the Universe is an example of sampling. Sample is a set of values (numerical or Boolean- Yes or No) that is taken for a particular question.

The number of items (10 apples in this case) selected is known as the Sample Size. It is believed that the sample size represents the entire universe. Convention is something that is commonly followed and is an unwritten rule within a discipline of study. In statistics convention, n represents the size of sample. Selecting is never easy- be it selecting ONE wife from all the eligible women in the world or selecting a random sample. Random basically means without using a person’s brain. Just select. So, we make a lot of mistakes (or errors) in selecting (or sampling).

Errors in Sampling: Systematic Error

These errors are also known as Non-Sampling Errors. Systematic errors result from decisions that bias the sample selection or response to survey. Bias Bias is an inclination of outlook to present or hold a partial perspective at the expense of (possibly equally valid) alternatives. Response to Survey is the answer given by a person while answering a survey. Four common mistakes that lead to systematic error are: 1. Population Specification Error: This error is one of not understanding who you should be surveying. For Example: If you want to know what people from Kanpur think about Bihari’s and you conduct the survey in Patna? Basically stupid person is conducting the survey. 2. Sample Frame Error: A frame error occurs when the wrong sub-population is specified from which the sample is drawn. A classic frame error occurred in predicting the 1936 presidential election between Roosevelt (Democratic) and Landon (Republican). The sample frame used was from car registrations and telephone directories. In 1936, car and telephone owners were largely Republicans. While the results may have reflected the sample, the predictions were not accurate for the US as a whole and the results wrongly predicted a Republican victory. Gallup did not make the mistake and could establish himself. 3. Selection Error: Selection error results when the respondents self select their participation... those who are interested respond. Selection error can be controlled by going extra lengths to get participation. Typical error occurs on News Websites ask a question where people can choose to vote. The people who have a strong opinion or something to benefit from the result vote.

4. Non-Response Error: Non response errors occur when non-respondents are different than those who respond. This may occur because either the potential respondent was not contacted (they did check their e-mail) or they refused to respond (they were all grumpy old men or beautiful young women afraid of strangers).

Errors in Sampling: Sampling Errors or Non Systematic Error

Sampling errors occur because of variation in the number or representativeness of the sample that responds. Two types of samples may be drawn, a probability sample where every person in the sample has an equal and known probability of being selected. Like from 10 random boys any one is are to be selected and someone like me has to select, each boy will have (1/10) or 10% chance of selection. Whereas non-probability sample where the probability of a person being selected is unknown unless you know the sample...