Regression Analysis: A Complete Example
This section works out an example that includes all the topics we have discussed so far in this chapter.
A complete example of regression analysis.
PhotoDisc, Inc./Getty Images
A random sample of eight drivers insured with a company and having similar auto insurance policies was selected. The following table lists their driving experiences (in years) and monthly auto insurance premiums. Driving Experience (years) Monthly Auto Insurance Premium
5 2 12 9 15 6 25 16
$64 87 50 71 44 56 42 60
a. Does the insurance premium depend on the driving experience or does the driving experience depend on the insurance premium? Do you expect a positive or a negative relationship between these two variables? b. Compute SSxx, SSyy, and SSxy. c. Find the least squares regression line by choosing appropriate dependent and independent variables based on your answer in part a. d. Interpret the meaning of the values of a and b calculated in part c. e. Plot the scatter diagram and the regression line. f. Calculate r and r2 and explain what they mean. g. Predict the monthly auto insurance premium for a driver with 10 years of driving experience. h. Compute the standard deviation of errors. i. Construct a 90% confidence interval for B. j. Test at the 5% significance level whether B is negative. k. Using α = .05, test whether ρ is different from zero.
Solution a. Based on theory and intuition, we expect the insurance premium to depend on driving experience. Consequently, the insurance premium is a dependent variable and driving experience is an independent variable in the regression model. A new driver is considered a high risk by the insurance companies, and he or she has to pay a higher premium for auto insurance. On average, the insurance premium is expected to decrease with an increase in the years of driving experience. Therefore, we expect a negative relationship between these two variables. In other words, both the population correlation coefficient ρ and the population regression slope B are expected to be negative. b. Table 13.5 shows the calculation of Σx, Σy, Σxy, Σx2, and Σy2. Table 13.5 Experience Premium x y xy x2 y2
5 2 12 9 15 6 25 16 Σx = 90
64 87 50 71 44 56 42 60 Σy = 474
320 174 639 336
25 4 81 36
4096 7569 2500 5041 1936 3136 1764 3600
600 144 660 225 1050 625 960 256
Σxy Σx Σy2 = = = 29,642 4739 1396
The values of x and y are
The values of SSxy, SSxx, and SSyy are computed as follows:
c. To find the regression line, we calculate a and b as follows:
Thus, our estimated regression line ŷ = a + bx is
d. The value of a = 76.6605 gives the value of ŷ for x = 0; that is, it gives the monthly auto insurance premium for a driver with no driving experience. However, as mentioned earlier in this chapter, we should not attach much importance to this statement because the sample contains drivers with only two or more years of experience. The value of b gives the change in ŷ due to a change of one unit in x. Thus, b = −1.5476 indicates that, on average, for every extra year of driving experience, the monthly auto insurance premium decreases by $1.55. Note that when b is negative, y decreases as x increases. e. Figure 13.21 shows the scatter diagram and the regression line for the data on eight auto drivers. Note that the regression line slopes downward from left to right. This result is consistent with the negative relationship we anticipated between driving experience and insurance premium.
Figure 13.21 Scatter
diagram and the regression line. f. The values of r and r2 are computed as follows:
The value of r = −.77 indicates that the driving experience and the monthly auto insurance premium are negatively related. The (linear) relationship is strong but not very strong. The value of r2 = .59 states that 59% of the total variation in insurance premiums is explained by years of driving experience and 41% is not. The low value of r2...
Please join StudyMode to read the full document