Regression Project Report
Tianao Zhang 12/5/2011
According to the data I’ve received, there are 6578 observations. The data base is composed by 13 columns and 506 rows. All the explanatory variables are continuous as well as the dependent variable and there are no categorical variables. My goal is to build a regression model to predict the average of Y or particular Y by a given X. 1. Do the regression assumptions such as Constant Variance, Normality and Independence and the correct functional hold for the model? By performing residual analysis, I can test the model. 2. Is there any relationship between the explanatory variables? I do multicollinearity test to test this condition. 3. I want to find out the confidence interval and prediction interval for the average Y and particular Y value. 4. In order to check the usefulness of the model and the relationship between X and Y, I consider several variables: i. Multiple Coefficient of Determination R2 and Radj2) ii. DWT iii. F Ratio iv. VIF value v. P Probability value.
Method of analysis
1. Find the important variables Use “Stepwise” to eliminate unimportant independent variables. Analysis—Fit Model—Stepwise After using “Stepwise”, JMP shows me that column 3 and column 7 should be deleted. So the rest of the columns have strong relationship with the dependent variables. 2. Checking VIF value If some variables’ are greater than 10, it means there is multicollinearity in the model. Fortunately there are no strong correlation exists between two independent variable. In this step, I will keep all the independent variables in the model. 3. Building model with the selected variables I get the model y=668.1274-0.108416*X1+0.0458433*X2+2.7188168*X4-17.37683*X5+3.8015829*X6-1 .492708*X8+0.2996025*X9-0.011777*X10-0.946554*X11+0.0092905*X12-0.52255*X13
4. Check violation of the regression assumption According to Durbin-Watson test, the assumption of independence is violated. So my model is not the right model. I need to build a new one. 5. Build new model through independent variable transformation Let X*=∆X=Xn+1-Xn and find the related variables, return to Step 1. The result shows me that the value of Radj2 is not acceptable. This new model is still not good enough. 6. Build new model through dependent variable transformation Let Y*=∆Y=Yn+1-Yn and find the related variables, return to Step 1. The Radj2 is also too low to be accepted. 7. Build new model through dependent and independent variable transformation Let X*=∆X=Xn+1-Xn, Y*=∆Y=Yn+1-Yn and find the related variables, return to Step 1. “Stepwise” delete two columns (X2 and X3). No sign of multicollinearity. 8. Check violation of the regression assumption I. Check the Durbin-Watson test. The value is 2.67 which is acceptable. II. III. IV. Check the “Residual by predicted plot” Check the “Residual by row plot” Check the residual distribution to find whether it is normal distributed.
There is no significant violation, so the assumption holds. 9. Check the influence of outliers Cook’s D value shows no influential outliers exist.
The final model Y* is ∆y=-0.041005*∆X1-1.60423*∆X4-19.45753*∆X5+5.0317852*∆X6-0.044223*∆X7-1.251392 * ∆X8+0.2991917*∆X9-0.024105*∆X10-0.369488*∆X11+0.010828*∆X12-0.266692*∆X13 Where ∆Xn=Xn+1-Xn and ∆Y=Yn+1-Yn
1. We can use “Stepwise” as a preliminary tool to identify which independent variables have strong relationship with dependent variables. SSE DFE RMSE RSquare RSquare Adj 0.7348 Cp p AICc BIC 5891.676 5945.882
3202503.2 494 80.515837 0.7406 Current Estimates Lock X Entered X X X X X X X X X X X X Step History Step 1 2 3 4 5 6 7 8 9 Parameter Column 13 Column 6 Column 11 Column 8 Column 5 Column 4 Column 12 Column 2 Column 1 Action Entered Entered Entered Entered Entered Entered Entered Entered Entered Parameter Intercept Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 7 Column 8 Column 9 Column 10 Column 11...