R2 is a statistic that will give some information about the goodness of fit of a model. In regression, the R2 coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An R2 of 1.0 indicates that the regression line perfectly fits the data. Adjusted R2 is a modification of R2 that adjusts for the number of explanatory terms in a model. Unlike R2, the adjusted R2 increases only if the new term improves the model more than would be expected by chance. The adjusted R2 can be negative, and will always be less than or equal to R2. Adjusted R2 does not have the same interpretation as R2. As such, care must be taken in interpreting and reporting this statistic. Adjusted R2 is particularly useful in the Feature selection stage of model building. Adjusted R2 is not always better than R2: adjusted R2 will be more useful only if the R2 is calculated based on a sample, not the entire population. For example, if our unit of analysis is a state, and we have data for all counties, then adjusted R2 will not yield any more useful information than R2.

2. How does testing the significance of the entire multiple regression models differ from testing the contribution of each independent variable?

When testing the significance of the entire multiple regression, we are testing the jointly affect of the regressors (predictors) all together. On the other hand, when testing the contribution of each independent variable, we are testing the affect of that specific variable on the dependent variable.

3. Why and how do you use dummy variables?

The use of dummy variables allows you to include categorical independent variables as

part of the regression model. If a given categorical independent variable has two categories, then you need only one dummy variable to represent the two categories.

For example, if the dummy variable...

(1)