The Ordinary Least Squares method of estimation can easily be extended to models involving two or more explanatory variables, though the algebra becomes progressively more complex. In fact, when dealing with the general regression problem with a large number of variables, we use matrix algebra, but that is beyond the scope of this course.

We illustrate the case of two explanatory variables, X1 and X2, with Y the dependant variable. We therefore have a model

Yi = α + 1X1i + 2X2i + ui

Where ui~N(0,σ2).

We look for estimators so as to minimise the sum of squared errors,

S =

Differentiating, and setting the partial differentials to zero we get

=0 (1)

=0(2)

=0(3)

These three equations are called the “normal equations”. They can be simplified as follows: Equation (1) can be written as

or

(4)

Where the bar over Y, X1 and X2 indicates sample mean. Equation (3) can be written as

Substituting in the value of from (4), we get

(5)

A similar equation results from (3) and (4). We can simplify this equation using the following notation. Let us define:

Equation (5) can then be written

S1Y = (6)

Similarly, equation (3) becomes

S2Y = (7)

We can solve these two equations to get:

and

Where =S11S22 – S122. We may therefore obtain from equation (4).

We can calculate the RSS, ESS and TSS from these estimators in the same way as for simple regression, that is

ESS=

TSS =

And, the coefficient of multiple determination is

R2 = ESS/TSS

That is, R2 is the proportion of the variation in Y explained by the regression.

The variances of our estimators are given by

and

Where r122 is the squared correlation coefficient between X1 and X2. Thus, the greater the correlation between the two explanatory variables, the greater the variance in the estimators, i.e. the...

...REGRESSION ANALYSIS
Correlation only indicates the degree and direction of relationship between two variables. It does not, necessarily connote a cause-effect relationship. Even when there are grounds to believe the causal relationship exits, correlation does not tell us which variable is the cause and which, the effect. For example, the demand for a commodity and its price will generally be found to be correlated, but the question whether demand depends on price or...

...
Logistic regression
In statistics, logistic regression, or logit regression, is a type of probabilistic statistical classification model.[1] It is also used to predict a binary response from a binary predictor, used for predicting the outcome of acategorical dependent variable (i.e., a class label) based on one or more predictor variables (features). That is, it is used in estimating the parameters of a qualitative response model. The probabilities...

...Regression Analysis: A Complete Example
This section works out an example that includes all the topics we have discussed so far in this chapter.
A complete example of regression analysis.
PhotoDisc, Inc./Getty Images
A random sample of eight drivers insured with a company and having similar auto insurance policies was selected. The following table lists their driving experiences (in years) and monthly auto insurance premiums.
Driving Experience (years) Monthly...

...140
13 MultipleregressionMultipleregression
In this chapter I will briefly outline how to use SPSS for Windows to run multipleregression analyses. This is a very simplified outline. It is important that you do
more reading on multipleregression before using it in your own research. A good
reference is Chapter 5 in Tabachanick and Fiddell (2001), which covers the
underlying...

...Topic 4. Multipleregression
Aims
• Explain the meaning of partial regression coefficient and calculate and interpret multipleregression models • Derive and interpret the multiple coefficient of determination R2and explain its relationship with the the adjusted R2 • Apply interval estimation and tests of significance to individual partial regression coefficients d d l ff • Test the...

...you
cannot consult the regression R2 because
(a) ln(Y) may be negative for 0 < Y < 1.
(b) the TSS are not measured in the same units between the two models.
(c) the slope no longer indicates the effect of a unit change of X on Y in the log-linear
model.
(d) the regression R2 can be greater than one in the second model.
1
(v) The exponential function
(a) is the inverse of the natural logarithm function.
(b) does not play an important role in modeling nonlinear...

...Topic 8: MultipleRegression Answer
a.
Scatterplot
120 Game Attendance 100 80 60 40 20 0 0 5,000 10,000 15,000 20,000 25,000 Team Win/Loss %
There appears to be a positive linear relationship between team win/loss percentage and
game attendance. There appears to be a positive linear relationship between opponent win/loss percentage and game attendance.
There appears to be a positive linear relationship between games played and game
attendance. There...