Seema S. Sonnad, PhD
Describing Data: Statistical and Graphical Methods1
An important step in any analysis is to describe the data by using descriptive and graphic methods. The author provides an approach to the most commonly used numeric and graphic methods for describing data. Methods are presented for summarizing data numerically, including presentation of data in tables and calculation of statistics for central tendency, variability, and distribution. Methods are also presented for displaying data graphically, including line graphs, bar graphs, histograms, and frequency polygons. The description and graphing of study data result in better analysis and presentation of data. ©
Index terms: Data analysis Statistical analysis Published online before print 10.1148/radiol.2253012154 Radiology 2002; 225:622– 628 1
From the Department of Surgery, University of Michigan Medical Center, Ann Arbor. Received January 14, 2002; revision requested March 2; revision received May 20; accepted June 14. Address correspondence to the author, Department of Surgery, University of Pennsylvania Health System, 4 Silverstein, 3400 Spruce St, Philadelphia, PA 19104-4283 (e-mail: seema.sonnad@uphs .upenn.edu). RSNA, 2002
A primary goal of statistics is to collapse data into easily understandable summaries. These summaries may then be used to compare sets of numbers from different sources or to evaluate relationships among sets of numbers. Later articles in this series will discuss methods for comparing data and evaluating relationships. The focus of this article is on methods for summarizing and describing data both numerically and graphically. Options for constructing measures that describe the data are presented ﬁrst, followed by methods for graphically examining your data. While these techniques are not methodologically difﬁcult, descriptive statistics are central to the process of organizing and summarizing anything that can be presented as numbers. Without an understanding of the key concepts surrounding calculation of descriptive statistics, it is difﬁcult to understand how to use data to make comparisons or draw inferences, topics that will be discussed extensively in future articles in this series. In this article, ﬁve properties of a set of numbers will be discussed. (a) Location or central tendency: What is the central or most typical value seen in the data? (b) Variability: To what degree are the observations spread or dispersed? (c) Distribution: Given the center and the amount of spread, are there speciﬁc gaps or concentrations in how the data cluster? Are the data distributed symmetrically or are they skewed? (d) Range: How extreme are the largest and smallest values of the observations? (e) Outliers: Are there any observations that do not ﬁt into the overall pattern of the data or that change the interpretation of the location or variability of the overall data set? The following tools are used to assess these properties: (a) summary statistics, including means, medians, modes, variances, ranges, quartiles, and tables; and (b) plotting of the data with histograms, box plots, and others. Use of these tools is an essential ﬁrst step to understand the data and make decisions about succeeding analytic steps. More speciﬁc deﬁnitions of these terms can be found in the Appendix.
DESCRIPTIVE STATISTICS Frequency Tables
One of the steps in organizing a set of numbers is counting how often each value occurs. An example would be to look at diagnosed prostate cancers and count how often in a 2-year period cancer is diagnosed as stage A, B, C, or D. For example, of 236 diagnosed cancers, 186 might be stage A, 42 stage B, six stage C, and two stage D. Because it is easier to understand these numbers if they are presented as percentages, we say 78.8% (186 of 236) are stage A, 17.8% (42 of 236) are stage B, 2.5% (six of 236) are stage C, and 0.9% (two of 236) are stage...