Statistics 101 Report
The Kentucky Milk Case Study
Figure 1: X as a Data Object
X is a data frame as derived from the program R shown above in Figure 1. There are 274 observations of 11 variables. The number of observations is obtained from the number of rows while the number of variables is obtained from the number of columns. 2b)
Figure 2: Creating a sub-data frame from X
Figure 3:Sub-data frame from X
Figure 2 shows a screenshot of the commands entered into R to create a sub-data frame X containing observations of the 7 selected variables. Figure 3 shows the sub-data frame created from the commands in R. 2c)
Figure 4: Missing values of WWBID
The variable WWBID has 36 missing values and the fraction of the number of missing values out of the total number of cases is1291. 2d)
The percentage of cases in the dataset which contains one or more missing value as calculated using R is 13.19% which can also be seen from Figure 4.
Figure 5: Bid price variable for markets Tri-County & Surround in 1984
Figure 6: Bid price variable for markets Tri-County & Surround in 1985
Figure 7: Bid price variable for markets Tri-County & Surround in 1986
Figure 8: Bid price variable for markets Tri-County & Surround in 1987
Figure 9: Bid price variable for markets Tri-County & Surround in 1988
Figures 5-9 above show the box plots obtained using R for Tri-County and Surround for the years 1984 to 1988. After examining these plots from the combined data for Meyer and Trauth, there is presence of potential outliers in all the years. There is a presence of potential outliers in all 5 years in Surround. On the other hand, they are only present in years 1985 and 1988 for Tri-County. For the potential outliers in Surround, they are all the maximum values of the bid price variable. However, Tri-County has a potential outlier in 1988 which is the minimum value of the bid price variable.
Figure 10: Missing Value for Tri-County in 1987
The missing value for the number of same vendors for Tri-County in 1987 as shown above from Figure 10 is 12. 3b)
Figure 11: Data frame created from info in Table 2
Figure 12: Incumbency Rate for Tri-County
Figure [ 13 ]: Incumbency Rate for Surround
Figures 12 and 13 are bar graphs of the yearly incumbency rates of Tri-County and Surround respectively. 3d) We observe that there is a general increase in the incumbency rates. The rate is highest in 1987 and the lowest is in year 1986. There is a sign of collusive behavior in years 1985 and 1987.
Number of missing bid price in WWBID = 20
Number of missing bid prices in LFWBID= 1
Number of missing bid prices in LFCBID= 36
Figure [ 14 ]: Missing Values
LFWBID should be analysed further as it has the least number of missing bid prices and hence it would provide a more accurate result.
Figure [ 15 ]: Percentage of Missing Bid Price Variable in each of the 3 bids 4b)
Table 1: Mean and Standard Deviations for Bid Prices of Tri-Country
| Standard Deviation
Table 2: Mean and Standard Deviations for Bid Prices of Surround
| Standard Deviation
Figure [ 16 ]: Mean and Standard Deviations obtained from R for Bid Prices of Surround and Tri-Country from 1983-1988 Based on the values for Tri-Country and Surround, there is a sudden drop in standard deviation for Tri-Country in years 1986 and 1987 as highlighted in orange above. There is a sudden decrease in standard deviation for Surround in years 1984, 1985 and 1988 as highlighted in green above....
Please join StudyMode to read the full document