Homework 1 Solutions
I. An insurance agency is examining the dollar amount of claims from clients who have homeowners insurance. For the 900 people who ﬁled claims, the ﬁve-number summary of the amount is: ($8800, $8850, $8900, $9100, $9940).
(a) Would the histogram displaying the data for the 900 claims be nearly bell-shaped? If so, explain how the summary indicates this. If not, determine if the data is skewed left or skewed right, and explain how the summary indicates this.
Ans: The histogram would look skewed to the right. The median is much closer to the 1st quartile than it is to the 3rd quartile, and the maximum value is much farther away from the median than the minimum value.
(b) Would the boxplot for the data indicate any outliers? Explain why or why not. Ans: IQR=9100-8850=250. 1.5 × IQR = 375. The inner fences will be 8850-375=8475 and 9100+375=9475 respectively. With a maximum of 9940, there is at least one high outlier. (There will be no low outliers because the minimum is 8800, which is within the inner fences.) II. A drug manufacturer has hundreds of sales representatives all over the United States. A histogram for yearly sales totals for each representative is roughly bell shaped and symmetric except for 4 high outliers corresponding to representatives in Boston, MA. Their sales totals are at least $60,000 greater than the next highest total. One analyst suggests dropping these 4 totals from the data to get a better summary of the sales across all regions of the country. (a) If the outliers were to be dropped, which measure of central tendency of the data set would be aﬀected the most – the mean, the median, or the mode? Explain why. Ans: The median is based on the order of the data, so dropping the high values will most likely not have much eﬀect on this measure. The mode is the most frequent data value, and it is unlikely that any of these 4 outliers would represent the mode. The mean depends on the size of the data points. It is pulled to the right away from the median by the high values. Eliminating these values will bring it back in line with the median, so it is going to be the value that is aﬀected the most. (b) The high outliers are dropped from the data and the mean is determined to be $70,000 with a standard deviation of $8,000. For future analysis, management would like to be able to identify sales amounts that are “unusually low”, which they deﬁned as being among the lowest 2.5% of all sales amounts. Using the Empirical Rule for this data, what amount should be considered the cut-oﬀ for sales amounts being classiﬁed as unusually low? Ans: By the empirical rule, roughly 2.5% of the amounts will fall below the value: Mean−2SD, or in this case: 70000 − 2 × 8000 = $54, 000.
ISOM 111 L11, Fall 2010
III. In order to control costs, a company wishes to study the amount of money its sales force spends entertaining clients. The following is a random sample of six entertainment expenses (dinner costs for four people) from expense reports submitted by members of the sales force. $157
(a) Calculate the sample mean x and sample variance s2 .
157 + 132 + 109 + 145 + 125 + 139
(157 − 134.5)2 + (132 − 134.5)2 + (109 − 134.5)2 + (145 − 134.5)2 + (125 − 134.5)2 + (139 − 134.5)2 6−1
(b) Assuming that the distribution of entertainment expenses is approximately normally distributed, construct intervals containing approximately 68%, 95%, and 99.7% of all entertainment expenses by the sales force.
Ans: By the empirical rule, the intervals can be calculated as follows: √
s = 276.7 = 16.63
[¯ ± s] = [134.5 ± 16.63] = [117.87, 151.13]
[¯ ± 2s] = [134.5 ± 2 × 16.63] = [101.24, 167.76]
[¯ ± 3s] = [134.5 ± 3 × 16.63] = [84.61, 184.39]
(c) If a member of the sales force submits an entertainment expense of $190, should this expense be considered unusually high (and...