Drawing Conclusions based on Samples
This chapter introduces how you can use data from a sample to draw conclusions about the larger population from which the sample was taken. Data often arises from the results of a survey of individuals. For example, the management of a fast food chain might be interested in determining the total number of dollars that Baylor students spend each year eating in Waco fast food restaurants. The fast food chain would also like to know the fast food preferences of the Baylor students. Both of these pieces of information would be helpful in estimating how successful a particular fast food chain might be if located near the Baylor campus. The management of the fast food chain probably does not have the time or money to question every single Baylor student. Time and money to question 300 students might be available. Techniques exist for randomly choosing a representative sample of 300 students from the population of the 14,000 Baylor students. An estimate or guess for the total dollars spent by all 14,000 Baylor students per year can be made from the amount spent by the sample of 300 students. You would not expect the estimate to be perfect. There would be some difference between the estimate based on the sample and the unknown total amount spent by all 14,000 students. The difference between the two amounts is called sampling error. The word, “error”, does not refer to the fact that a mistake has been made. It simply means that you have based your findings on incomplete yet representative data. Procedures exist for estimating the maximum amount of sampling error. You will be able to state with a large degree of confidence how much difference there is between your estimate (a statistic) and the true answer (a parameter).
Random variable - A characteristic of interest such as the amount spent per year by a Baylor student eating at fast food restaurants in Waco.
Experimental unit - The entity on which a random variable is measured such as each randomly selected Baylor student.
Population - An entire collection of individuals, objects, or measurements whose characteristics are of some interest. The entire Baylor student body would be considered a population.
Parameter - A characteristic of a population. The total or average amount spent per year by all Baylor students eating at fast food restaurants in Waco is an example of a parameter. A parameter is typically unknown but constant.
Sample - A portion or subset of a population.
Statistic - A characteristic of a sample. The total or average amount spent per year by 300 randomly selected Baylor students eating at fast food restaurants in Waco would be considered a statistic. A statistic is variable depending on the particular sample selected. The value of the statistic will be known once the sample is drawn. You use a statistic to provide an estimate or guess of a parameter. It is the value of the parameter that you are really interested in, but you usually have to accept an estimate of the parameter based on the statistic.
Sampling error - The difference between a parameter and a statistic. An estimate of the maximum amount of sampling error is based on the distribution of possible values that the statistic can assume from theoretical repeated samplings. The distributions of common statistics such as sample means and sample proportions are known. These distributions are referred to as sampling distributions.
Before you can begin to assess the amount of sampling error in a given situation, some understanding of the most common statistical distribution is required. This distribution is referred to as the normal distribution or bell-shaped distribution. A graph of a normally distributed variable is given below. The graph is sometimes referred to as a normal curve.
A normally distributed variable is characterized by two...