Wednesday 5-6pm

WEEK 3 BES PASS

Descriptive Statistics Population - a set of all possible observations. Sample - a portion of a population. We often use information concerning a sample to make an inference (conclusion) about the population.

Parameter - describes a characteristic of the population, eg: the population variance Statistic- describes a characteristic of a sample, eg: the sample variance

Frequency Distribution and Histograms Class - a collection of data which are mutually exclusive Frequency distribution - a grouping of data into classes Relative frequency distribution - calculates the number of data in a class as a percentage of the total data

Shapes of Distributions and Histograms

A histogram is symmetrical if one half of the histogram is a mirror reflection of the other Non-symmetrical distributions are said to be “skewed”

a) Skewed to the right (Positively skewed) Mode < Median < Mean

b) Skewed to the left (Negatively skewed) Mode > Median > Mean

c) Symmetric Distribution Mode = Median = Mean

Measures of Central Tendency: The Mean, Mode and Median The mean is the average of scores: Population mean: μ = Σ xi/N

Sample mean: x = Σ xi/n

The mode is the value that has the highest frequency The median is the middle value of data ordered from lowest to highest The median and the mode are relatively less sensitive to outliers.

Quartiles and Percentiles, including the Median Percentile values divide the data (arranged in ascending order) into 100 equal parts. They are a measure of relative standing. P% of the data is less than the pth percentile, and (100 – p)% of the data is greater than the pth percentile.

BES PASS S1 10

1

Omkar & Yaying

Wednesday 5-6pm

L= The median is the 50th percentile

p x (n + 1) 100

The inter-quartile range is the difference between the 75 th percentile (upper quartile) and the 25th percentile (lower quartile) E.g. o o o 2, 3, 3, 6, 8, 9, 14, 16, 17, 20 between 2nd and 3rd scores = 3 Median = (50 x 11) / 100 = 5.5 obtain data value ½ of the way between 5th and 6th scores = 8.5 Upper quartile = (75 x 11)/100 = 8.25 obtain data value ¼ of the way between 8th and 9th scores = 16.25 n = 10

lower quartile = (25 x 11) / 100 = 2.75 obtain data value ¾ of the way

PRACTICE QUESTION #1:

1. A firm has 45 employees. The data shows the number of weeks vacation the employees take annually: Weeks 2 3 Employees 19 14 Weeks 4 8 Employees 8 4

a) Calculate the first quartile for the number of weeks of vacation. 2 b) Find the median and the third (upper) quartile. 3, 4

Measures of Dispersion

Measures of dispersion tell us about the variability in a set of data:

Range = largest score – smallest score Inter-quartile range (IQR) = 3rd quartile – 1st quartile. This represents the spread of the middle 50% of the data set.

X i2 N 2 X i For a population, the variation is given by = N N 2 2

Variance and Standard Deviation (SD): are both based on deviations from the mean

xi x xi2 n x For a sample, the variance is given by s = n 1 n 1 2

2

2

In both cases, the standard deviation is found by taking the square root of the variance. E.g. Given the following income data: $23,000 $36,500 $47,200 $20,200 $61,300: o o Sample mean = (23,000+36,500+47,200+20,200+61,300) / 5 = $37,640 Sample variance =

23,000

2

36,500 2 47,200 2 20,200 2 61,300 2 5 37,640 2 / 5 1 $292,743,000 292,743,000 $17,109.73

o

Sample standard deviation =

BES PASS S1 10

2

Omkar & Yaying

Wednesday 5-6pm

The Coefficient of Variation (CV): the coefficient of variation should be used as it is a relative measure of dispersion where the data sets: o o Are measured in different units Have means which are widely divergent even though they are measured in the same units ie. They differ significantly

Population...