This topic covers: The concept and measures of central tendency for ungrouped and grouped data. The concept and measures of dispersion for ungrouped and grouped data.

Introduction

When we look at a distribution of data, we should consider three characteristics: Shape (chapters 2 and 4) Center / Location (central tendency measurement) Spread (dispersion measurement) With these characteristics, we can numerically describe the main features of a data set. And, we may describe about the behaviour of the data in much simpler form.

Centre/location

Shape

Spread

Central Tendency Measurement

A measure of central tendency gives the center of a histogram or a frequency distribution. To report a typical value that is representative of the data. Three common measures of central tendency: Mean (Arithmetic mean) Median Mode

Other measures of central tendency:

Trimmed mean Harmonic mean Geometric mean

CENTRAL OF TENDENCY

Scale type

Permissible central of tendency

Nominal

Mode

Ordinal

Median

Interval

Mean, Mode*, Median* All statistics are permitted including geometric mean, harmonic mean, trimmed mean, and other robust means.

Ratio

Central tendency for Ungrouped Data

Mean (Arithmetic mean)

The most frequently used measure of central tendency. The mean of a data set is the sum of the observation divided by the number of observation.

Population Data

Sample Data

Median

The median is the value of the middle term in a data set that has been ranked in increasing order. Steps: 1) Rank the data in increasing order. 2) Determine the depth (position) of the median.

3) Determine the value of the median.

Mode

The mode of the data set is its most frequently occurring values. Not unique. No mode – a data set with each value occurring only once (e.g. 3,4,5,6,1,2,7,8). Unimodal – a data set with only one value occurring with the highest frequency (e.g. 3,4,5,5,1,2,7,8). Bimodal – a data set with two values that occur with same (highest) frequency (e.g. 3,3,5,5,3,2,5,8). Multimodal - more than two values in a data set occur with the same (highest) frequency.

Mean

Advantages Unique Consider all data set during the mean calculation Sensitive to outlier

Median

Unique Resistant to outlier

Mode

Can be used to calculate qualitative and quantitative data Not unique Some of the data set doesn’t have mode value Most frequent observation

Disadvantages

It is difficulty to handle theoretically Divides the bottom 50% of the data from the top 50% When the frequency distribution is skewed left or right

Interpretation

Center of gravity

When to use

When the data are quantitative and the frequency distribution is roughly symmetric

When the most frequent observation is the desired measure of central tendency or the data are qualitative

Class Activity 1

Selecting an appropriate measure of center (mean, median, or mode) for following situation: A student takes four exam in a biology class. His grade are 88, 75, 95, and 100. Mean The National Association of REALTORS publishes data on resale price of U.S. homes. Median The marathon had two categories of official finishers: male and female, of which there were 10894 and 6655, respectively. Mode

The issue of Outliers

• The arithmetic mean is the most preferable measure BUT it is easily deteriorate when there are outliers in the data. OUTLIERS An observation (or a set of observations) that is numerically distant from the rest of the data. • To minimise such deterioration, other statistics that are resistant to errors in the results are needed.

The issue of Outliers

• Example Given are the 10 observations. 30 171 184 201 212 250 265 270 272 289

1. Compute median. 2. Compute mean. 3. Can you spot the differences? Why such results occur?

Weighted Mean

Sometimes, certain data values have a higher importance or...