Concepts and techniques for managing, editing, analyzing and interpreting data from epidemiologic studies. Key concepts/expectations This chapter contains a great deal of material and goes beyond what you are expected to learn for this course (i.e., for examination questions). However, statistical issues pervade epidemiologic studies, and you may find some of the material that follows of use as you read the literature. So if you find that you are getting lost and begin to wonder what points you are expected to learn, please refer to the following list of concepts we expect you to know: Need to edit data before serious analysis and to catch errors as soon as possible. Options for data cleaning – range checks, consistency checks – and what these can (and can not) accomplish. What is meant by data coding and why is it carried out. Basic meaning of various terms used to characterize the mathematical attributes of different kinds of variables, i.e., nominal, dichotomous, categorical, ordinal, measurement, count, discrete, interval, ratio, continuous. Be able to recognize examples of different kinds of variables and advantages/disadvantages of treating them in different ways. What is meant by a “derived” variable and different types of derived variables. Objectives of statistical hypothesis tests (“significance” tests), the meaning of the outcomes from such tests, and how to interpret a p-value. What is a confidence interval and how it can be interpreted. Concepts of Type I error, Type II error, significance level, confidence level, statistical “power”, statistical precision, and the relationship among these concepts and sample size. Computation of p-values, confidence intervals, power, or sample size will not be asked for on exams. Fisher’s exact test, asymptotic tests, z-tables, 1-sided vs. 2-sided tests, intracluster correlation, Bayesian versus frequentist approaches, meta-analysis, and interpretation of multiple significance tests are all purely for your edification and enjoyment, as far as EPID 168 is concerned, not for examinations. In general, I encourage a nondogmatic approach to statistics (caveat: I am not a “licensed” statistician!).
_________________________________ www.epidemiolog.net © Victor J. Schoenbach 14. Data analysis and interpretation – 451 rev. 3/29/2004, 6/27/2004, 7/22/2004
Data analysis and interpretation
Epidemiologists often find data analysis the most enjoyable part of carrying out an epidemiologic study, since after all of the hard work and waiting they get the chance to find out the answers. If the data do not provide answers, that presents yet another opportunity for creativity! So analyzing the data and interpreting the results are the “reward” for the work of collecting the data. Data do not, however, “speak for themselves”. They reveal what the analyst can detect. So when the new investigator, attempting to collect this reward, finds him/herself alone with the dataset and no idea how to proceed, the feeling may be one more of anxiety than of eager anticipation. As with most other aspects of a study, analysis and interpretation of the study should relate to the study objectives and research questions. One often-helpful strategy is to begin by imagining or even outlining the manuscript(s) to be written from the data. The usual analysis approach is to begin with descriptive analyses, to explore and gain a “feel” for the data. The analyst then turns to address specific questions from the study aims or hypotheses, from findings and questions from studies reported in the literature, and from patterns suggested by the descriptive analyses. Before analysis begins in earnest, though, a considerable amount of preparatory work must usually be carried out.
Analysis - major objectives
1. Evaluate and enhance data quality 2. Describe the study population and its relationship to some presumed...