Statistics 101 Report
The Kentucky Milk Case Study
 

Preliminary Analysis
2a)
Figure 1: X as a Data Object
X is a data frame as derived from the program R shown above in Figure 1. There are 274 observations of 11 variables. The number of observations is obtained from the number of rows while the number of variables is obtained from the number of columns. 2b)
Figure 2: Creating a subdata frame from X
Figure 3:Subdata frame from X
Figure 2 shows a screenshot of the commands entered into R to create a subdata frame X containing observations of the 7 selected variables. Figure 3 shows the subdata frame created from the commands in R. 2c)
Figure 4: Missing values of WWBID
The variable WWBID has 36 missing values and the fraction of the number of missing values out of the total number of cases is1291. 2d)
The percentage of cases in the dataset which contains one or more missing value as calculated using R is 13.19% which can also be seen from Figure 4.
2e)
Figure 5: Bid price variable for markets TriCounty & Surround in 1984
Figure 6: Bid price variable for markets TriCounty & Surround in 1985
Figure 7: Bid price variable for markets TriCounty & Surround in 1986
Figure 8: Bid price variable for markets TriCounty & Surround in 1987
Figure 9: Bid price variable for markets TriCounty & Surround in 1988
Figures 59 above show the box plots obtained using R for TriCounty and Surround for the years 1984 to 1988. After examining these plots from the combined data for Meyer and Trauth, there is presence of potential outliers in all the years. There is a presence of potential outliers in all 5 years in Surround. On the other hand, they are only present in years 1985 and 1988 for TriCounty. For the potential outliers in Surround, they are all the maximum values of the bid price variable. However, TriCounty has a potential outlier in 1988 which is the minimum value of the bid price variable.
Incumbency Rates
3a)
Figure 10: Missing Value for TriCounty in 1987
The missing value for the number of same vendors for TriCounty in 1987 as shown above from Figure 10 is 12. 3b)
Figure 11: Data frame created from info in Table 2
3c)
Figure 12: Incumbency Rate for TriCounty
Figure [ 13 ]: Incumbency Rate for Surround
Figures 12 and 13 are bar graphs of the yearly incumbency rates of TriCounty and Surround respectively. 3d) We observe that there is a general increase in the incumbency rates. The rate is highest in 1987 and the lowest is in year 1986. There is a sign of collusive behavior in years 1985 and 1987.
4a)
Number of missing bid price in WWBID = 20
Number of missing bid prices in LFWBID= 1
Number of missing bid prices in LFCBID= 36
Figure [ 14 ]: Missing Values
LFWBID should be analysed further as it has the least number of missing bid prices and hence it would provide a more accurate result.
Figure [ 15 ]: Percentage of Missing Bid Price Variable in each of the 3 bids 4b)
Table 1: Mean and Standard Deviations for Bid Prices of TriCountry
Year (TriCountry) Mean Standard Deviation
1983 0.111 0.0125
1984 0.124 0.0147
1985 0.120 0.0145
1986 0.139 0.00533
1987 0.142 0.00702
1988 0.118 0.0129
Table 2: Mean and Standard Deviations for Bid Prices of Surround
Year (Surround) Mean Standard Deviation
1983 0.124 0.0164
1984 0.134 0.00636
1985 0.137 0.00537
1986 0.118 0.0118
1987 0.117 0.0124
1988 0.145 0.00618
Figure [ 16 ]: Mean and Standard Deviations obtained from R for Bid Prices of Surround and TriCountry from 19831988 Based on the values for TriCountry and Surround, there is a sudden drop in standard deviation for TriCountry in years 1986 and 1987 as highlighted in orange above. There is a sudden decrease in standard deviation for Surround in years 1984, 1985 and 1988 as highlighted in green above....