Fall 2006

Regression: Testing Assumptions

December 4, 2006

Linearity

The linearity of the regression mean can be examined visually by plots of the residuals against any of the independent variables, or against the predicted values. Chart 1 shows a residual plot that reveals no Chart 2

C hart 1

0.4

0.4

0.3

0.3

0.2

0.1

0.1

Residual

Residual

0.2

0.0

-0.1

0.0

-0.1

-0.2

-0.2

-0.3

-0.3

-0.4

-0.5

-0.4

Predicted

Predicted

departures from the assumption of linearity, while Chart 2 reveals nonlinearity. In both cases, the presumed model is that of a simple linear regression, yi = α + β xi + ε i with n = 30. A statistical test for linearity can be constructed by adding powers of fitted values to the regression model, and then testing the hypothesis of linearity by testing the hypothesis that the added parameters have values equal to zero. This is known as the RESET test (Ramsey). The multiple regression model can be written yi = β 0 + β1 xi1 + L β k xik + ε i . Least squares estimates of the model parameters are obtained, and powers of the predicted values are added to form an augmented model:

ˆ

ˆ

ˆ

yi = β 0 + β1 xi1 + L β k xik + γ 1 ( yi )2 + γ 2 ( yi )3 + γ 3 ( yi )4 + ε i The null hypotheses to be tested is H 0 : γ 1 = γ 2 = γ 3 = 0 , which is tested with the appropriate F-statistic referred to the F-distribution with degrees of freedom [3, n – (k + 3) – 1]. For the example shown in chart 1, SSE for the restricted (original) model equals 0.65 (df = 28), SSE for the unrestricted (augmented) model equals 0.50 (df = 25), the RESET F-statistic equals 2.52 and the p-value of the test is 0.08. The hypothesis of linearity would not be rejected at the 5% level. For the example in chart 2, SSE for the original and augmented models equal 0.82 (df = 28) and 0.50 (df = 25), respectively, the RESET F-statistic equals 5.44 and the p-value of the test is 0.01. The hypothesis of linearity would be rejected at the 5% level. In both cases the 5% critical value is 2.99.

Homoskedasticity (Constant Variance)

The constancy of the variance of the dependent variable (error variance) can be examined from plots of the residuals against any of the independent variables, or against the predicted values. Chart 3 shows Chart 4

C hart 3

0.1

0.2

0.1

0.1

0.1

0.1

residuals

residuals

0.0

0.0

-0.1

0.0

0.0

0.0

-0.1

0.0

-0.2

-0.1

-0.2

-0.1

pre dicte d

p redicted

residuals plotted against predicted values, with no clear departure from the homoskedastic assumption. Chart 4 shows evidence of variance increasing with mean. Both are presumed to be bivariate regressions modeled as yi = α + β1 xi1 + β 2 xi 2 + ε i with n = 30. A test of homoskedasticity due to White is performed by obtaining least squares residuals, and modeling the square residuals as a multiple regression of the original independent variables plus all squares and second-degree products (interactions). For the examples displayed in charts 3 and 4, this would amount to the equation:

ei2 = γ 0 + γ 1 xi1 + γ 2 xi 2 + γ 3 ( xi1 )2 + γ 4 ( xi1 xi 2 ) + γ 5 ( xi 2 )2 + ε i The test of homoskedasticity is performed by testing the hypothesis of no regression, i.e., H 0 : γ 1 = γ 2 = γ 3 = γ 4 = γ 5 = 0 . Instead of the F-test, an alternative test is conducted by referring the statistic nR 2 to the chi-square distribution with (in this case) 5 degrees of freedom, i.e., the test statistic equals the sample size multiplied by the coefficient of determination. For the example of chart 3, the White statistic equals 5.24, and the p-value comes to 0.39. For the example of chart 4, the test statistic equals 10.70 and the p-value amounts to 0.06. The null hypotheses would be reject at the 10% level, but not at the 5% level. In both cases the 5% critical point equals 11.07.

Normality

Plotting the empirical distribution of residuals...