Data Analysis: Analyzing Data - Inferential Statistics

Inferential statistics deal with drawing conclusions and, in some cases, making predictions about the properties of a population based on information obtained from a sample. While descriptive statistics provide information about the central tendency, dispersion, skew, and kurtosis of data, inferential statistics allow making broader statements about the relationships between data. Inferential statistics are frequently used to answer cause-and-effect questions and make predictions. They are also used to investigate differences between and among groups. However, one must understand that inferential statistics by themselves do not prove causality. Such proof is always a function of a given theory, and it is vital that such theory be clearly stated prior to using inferential statistics. Otherwise, their use is little more than a fishing expedition. For example, suppose that statistical methods suggest that on average, men are paid significantly more than women for full-time work. Several competing explanations may exist for this discrepancy. Inferential statistics can provide evidence to prove one theory more accurate than the other. However, any ultimate conclusions about actual causality must come from a theory supported by both the data and sound logic.

WHEN TO USE IT

HOW TO PREPARE IT

The following briefly introduces some common techniques of inferential statistics and is intended as a guide for determining when certain techniques may be appropriate. The techniques used generally depend on the kinds of variables involved, i.e. nominal, ordinal, or interval. For further information on and/or assistance with a given technique, refer to the books and in-house support listed in the Resources section at the end of this module. + Chi-square (2 2) tests are used to identify differences between groups when all variables nominal, e.g., gender, ethnicity, salary group, political party affiliation, and so forth. Such tests are normally used with contingency tables which group observations based on common characteristics. For example, suppose one wants to determine if political party affiliation differs among ethnic groups. A contingency table which divides the sample into political parties and ethnic groups could be produced. A 22 test would tell if the ethnic distribution of the sample indicates differences in party affiliation. As a rule, each cell in a contingency table should have a least five observations. In those cases where this is not possible, the Fisher’s Exact Test should replace the 22 test. Data analysis software will usually warn the user when a cell contains fewer than five observations.

+

Analysis of variance (ANOVA) permits comparison of two or more populations when interval variables are used. ANOVA does this by comparing the dispersion of samples in order to make inferences about their means. ANOVA seeks to answer two basic questions:

Texas State Auditor's Office, Methodology Manual, rev. 5/95

Data Analysis: Analyzing Data - Inferential Statistics - 1

Data Analysis: Analyzing Data - Inferential Statistics

— —

Accountability Modules

Are the means of variables of interest different in different populations? Are the differences in the mean values statistically significant?

For example, during the Welfare Reform audit, SAO staff wanted to test whether the incomes of job training program participants and nonparticipants were significantly different. ANOVA was used to do this. +

Analysis of covariance (ACOVA) examines whether or not interval variables move together in ways that are independent of their mean values. Ideally, variables should move independently of one another, regardless of their means. Unfortunately, in the real world, groups of observations usually differ on a number of dimensions, making simple analyses of variance tests problematic since...