To investigate if there is evidence to suggest that the two operations brick and tile have a difference in accident rates.
This type of question involves analysing the data for a difference, and so tests such as an equal variance test, a normality test and a Mann Whitney test will be undertaken. The meaning and the significance of these tests will be explained and justified later on in the report.
The null hypothesis (Ho) for this investigation is that there is no genuine evidence of difference in accident rates between the two operations. This hypothesis will be accepted if the p-value for the tests is greater than 0.05, otherwise it will be rejected and the alternative hypothesis (H1) will be accepted.
The p-value is the estimated probability of rejecting the null hypothesis (Ho) of a study when the study question is true (http://www.statsdirect.com/help/basics/pval.htm, 2009). The p-value of 0.05 is used as the main reference point for accepting or rejecting the null hypothesis, anything greater or smaller than 0.05 suggests the strength of evidence against, or for, the null hypothesis, as shown in table 1.
Table 1: Showing the range of p-values, action to take in that case and the strength of evidence it suggests.
p-value| Action| Strength of evidence against Ho|
>0.05| Retain Ho| Insufficient |
≤0.05| Reject Ho, Accept H1| Some |
≤0.01| Reject Ho, Accept H1| Strong |
≤0.001| Reject Ho, Accept H1| Very strong |
As it can be seen from table 1, when a p-value is equal to or smaller than 0.05 the Ho must be rejected and the H1 accepted.
The H1, for this investigation is that there is genuine evidence of difference in accident rates between the two operations.
The analysis of the data was started by doing descriptive statistics for the data as shown below.
Table 2: Showing the descriptive statistics for the brick and tile data.
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum Tile 16 0 412 127 508 56 124 227 390 2017 Brick 12 0 1075 417 1443 121 328 550 1453 5218
The purpose of the descriptive statistics is to summarise the data, using measurements such as the mean and the median. The mean is the total of all the values divided by the number of values in the sample (the average), and the median is the value of the middle number when the data is put in numerical order (Dytham, 1999). It can be seen from the descriptive statistics, that the mean and the median values are not similar for both tile and brick data; the mean in both cases is nearly double the median, for example for brick data the mean is 1075 and the median is 550. This suggests the data is not equally spread out (normally distributed) and maybe skewed at one side, however, this is not easy to see from numeric data and so a box plot has been produced, as shown below. Outlier
Third quartile of the data (Q3)
First quartile of the data (Q1)
Median Value (middle Value of the data)
Mean Value (Average)
Figure 1: Boxplot showing the accident rates per 100,000 workers/year/plant in tile and brick operations.
A box plot takes into consideration the whole range of data, and so it corresponds to the values given from the descriptive statistics as shown by the annotations above. It can be seen from the boxplot how far the median line is from the mean value and so the data is skewed for both brick and tile operations. In addition, the median line lies more to the left of the box, in both cases and so the data is said to be left skewed, which could be due to most of the accident rates for each operation being lower than 1000 accidents per workers/year/plant.
To confirm the data is not evenly (normally) distributed or does not have equal variances, two more tests were carried out; the equal variance tests and the...