The Ordinary Least Squares method of estimation can easily be extended to models involving two or more explanatory variables, though the algebra becomes progressively more complex. In fact, when dealing with the general regression problem with a large number of variables, we use matrix algebra, but that is beyond the scope of this course.
We illustrate the case of two explanatory variables, X1 and X2, with Y the dependant variable. We therefore have a model
Yi = α + 1X1i + 2X2i + ui
Where ui~N(0,σ2).
We look for estimators so as to minimise the sum of squared errors,
S =
Differentiating, and setting the partial differentials to zero we get
=0 (1)
=0(2)
=0(3)
These three equations are called the “normal equations”. They can be simplified as follows: Equation (1) can be written as
or
(4)
Where the bar over Y, X1 and X2 indicates sample mean. Equation (3) can be written as
Substituting in the value of from (4), we get
(5)
A similar equation results from (3) and (4). We can simplify this equation using the following notation. Let us define:
Equation (5) can then be written
S1Y = (6)
Similarly, equation (3) becomes
S2Y = (7)
We can solve these two equations to get:
and
Where =S11S22 – S122. We may therefore obtain from equation (4).
We can calculate the RSS, ESS and TSS from these estimators in the same way as for simple regression, that is
ESS=
TSS =
And, the coefficient of multiple determination is
R2 = ESS/TSS
That is, R2 is the proportion of the variation in Y explained by the regression.
The variances of our estimators are given by
and
Where r122 is the squared correlation coefficient between X1 and X2. Thus, the greater the correlation between the two explanatory variables, the greater the variance in the estimators, i.e. the...
...MultipleRegression Analysis of exchange rate with the determinant factors
Regression Analysis: USD versus GDP Growth, FER, FDI Growth, Interest Rate, Money Supply, Terms Of Trade
The regression equation is
USD = 41.5  1.95 GDP Growth + 0.000943 FER  0.139 FDI Growth + 0.048 Differential Interest Rate + 0.000067 Money Supply + 0.166 Terms of Trade  0.000097 External Debt 
Predictor T PConstant 2.32 0.039GDPGrowth 3.43 0.005 FER 1.01 0.332FDIGrowth 1.55 0.146Differential Int Rate 0.11 0.913Money Supply 0.89 0.393Terms of Trade 0.35 0.731External Debt 0.73 0.479 
Where,

T is t stat. Tstat is a measure of the relative strength of prediction (is more reliable than the regression coefficient because it takes into account error). 
The pvalue is a percentage. It tells you how likely it is that the coefficient for that independent variable emerged by chance and does not describe a real relationship.
A pvalue of .05 means that there is a 5% chance that the relationship emerged randomly and a 95% chance that the relationship is real. ...
...REGRESSION ANALYSIS
Correlation only indicates the degree and direction of relationship between two variables. It does not, necessarily connote a causeeffect relationship. Even when there are grounds to believe the causal relationship exits, correlation does not tell us which variable is the cause and which, the effect. For example, the demand for a commodity and its price will generally be found to be correlated, but the question whether demand depends on price or viceversa; will not be answered by correlation.
The dictionary meaning of the ‘regression’ is the act of the returning or going back. The term ‘regression’ was first used by Francis Galton in 1877 while studying the relationship between the heights of fathers and sons.
“Regression is the measure of the average relationship between two or more variables in terms of the original units of data.”
The line of regression is the line, which gives the best estimate to the values of one variable for any specific values of other variables.
For two variables on regression analysis, there are two regression lines. One line as the regression of x on y and other is for regression of y on x.
These two regression line show the average relationship between the two variables. The regression line of y on x gives the most probable value of y for given value of...
...
Logistic regression
In statistics, logistic regression, or logit regression, is a type of probabilistic statistical classification model.[1] It is also used to predict a binary response from a binary predictor, used for predicting the outcome of acategorical dependent variable (i.e., a class label) based on one or more predictor variables (features). That is, it is used in estimating the parameters of a qualitative response model. The probabilities describing the possible outcomes of a single trial are modeled, as a function of the explanatory (predictor) variables, using a logistic function. Frequently (and subsequently in this article) "logistic regression" is used to refer specifically to the problem in which the dependent variable is binary—that is, the number of available categories is two—while problems with more than two categories are referred to as multinomial logistic regression or, if the multiple categories are ordered, as ordered logistic regression.
Logistic regression measures the relationship between a categorical dependent variable and one or more independent variables, which are usually (but not necessarily) continuous, by using probability scores as the predicted values of the dependent variable.[2] As such it treats the same set of problems as doesprobit regression using similar techniques.
Fields and examples of...
...Regression Analysis: A Complete Example
This section works out an example that includes all the topics we have discussed so far in this chapter.
A complete example of regression analysis.
PhotoDisc, Inc./Getty Images
A random sample of eight drivers insured with a company and having similar auto insurance policies was selected. The following table lists their driving experiences (in years) and monthly auto insurance premiums.
Driving Experience (years) Monthly Auto Insurance Premium
5 2 12 9 15 6 25 16
$64 87 50 71 44 56 42 60
a. Does the insurance premium depend on the driving experience or does the driving experience depend on the insurance premium? Do you expect a positive or a negative relationship between these two variables? b. Compute SSxx, SSyy, and SSxy. c. Find the least squares regression line by choosing appropriate dependent and independent variables based on your answer in part a. d. Interpret the meaning of the values of a and b calculated in part c. e. Plot the scatter diagram and the regression line. f. Calculate r and r2 and explain what they mean. g. Predict the monthly auto insurance premium for a driver with 10 years of driving experience. h. Compute the standard deviation of errors. i. Construct a 90% confidence interval for B. j. Test at the 5% significance level whether B is negative. k. Using α = .05, test whether ρ is different from zero.
Solution a. Based on theory and intuition, we...
...0905section2.QX5
7/12/04
4:10 PM
Page 140
13 MultipleregressionMultipleregression
In this chapter I will briefly outline how to use SPSS for Windows to run multipleregression analyses. This is a very simplified outline. It is important that you do
more reading on multipleregression before using it in your own research. A good
reference is Chapter 5 in Tabachanick and Fiddell (2001), which covers the
underlying theory, the different types of multipleregression analyses and the
assumptions that you need to check.
Multipleregression is not just one technique but a family of techniques that
can be used to explore the relationship between one continuous dependent variable
and a number of independent variables or predictors (usually continuous). Multipleregression is based on correlation (covered in Chapter 11), but allows a more
sophisticated exploration of the interrelationship among a set of variables. This
makes it ideal for the investigation of more complex reallife, rather than
laboratorybased, research questions. However, you cannot just throw variables
into a multipleregression and hope that, magically, answers will appear. You
should have a sound theoretical or conceptual reason for the analysis and, in
particular, the order of...
...Topic 4. Multipleregression
Aims
• Explain the meaning of partial regression coefficient and calculate and interpret multipleregression models • Derive and interpret the multiple coefficient of determination R2and explain its relationship with the the adjusted R2 • Apply interval estimation and tests of significance to individual partial regression coefficients d d l ff • Test the significance of the whole model (Ftest)
Introduction
• The basic multipleregression model is a simple extension of the bivariate equation. • By adding extra independent variables, we are creating a multipledimensioned space, where the model fit is a some appropriate space. , p , • For instance, if there are two independent variables, we are fitting the points to a ‘plane in space’. trick. • Visualizing this in more dimensions is a good trick
Model specification – scalar version
• The basic linear model: • Yi = ß0 + ß1 X1i+ ß2X2i+ ß3X3i +….+ ßkXki +ui …. u • If bivariate regression can be described as a line on a plane, multipleregression represents a kdimensional object in a k+1 d dimensional space. l
Matrix version
• We can use a different type of mathematical g structure to describe the regression model Frequently called Matrix or Linear Algebra • The multiple...
...associated with a β1 change in Y.
(iii) The interpretation of the slope coefficient in the model ln(Yi ) = β0 + β1 ln(Xi ) + ui is as
follows:
(a) a 1% change in X is associated with a β1 % change in Y.
(b) a change in X by one unit is associated with a β1 change in Y.
(c) a change in X by one unit is associated with a 100β1 % change in Y.
(d) a 1% change in X is associated with a change in Y of 0.01β1 .
(iv) To decide whether Yi = β0 + β1 X + ui or ln(Yi ) = β0 + β1 X + ui fits the data better, you
cannot consult the regression R2 because
(a) ln(Y) may be negative for 0 < Y < 1.
(b) the TSS are not measured in the same units between the two models.
(c) the slope no longer indicates the effect of a unit change of X on Y in the loglinear
model.
(d) the regression R2 can be greater than one in the second model.
1
(v) The exponential function
(a) is the inverse of the natural logarithm function.
(b) does not play an important role in modeling nonlinear regression functions in econometrics.
(c) can be written as exp(ex ).
(d) is ex , where e is 3.1415...
(vi) The following are properties of the logarithm function with the exception of
(a) ln(1/x) = −ln(x).
(b) ln(a + x) = ln(a) + ln(x).
(c) ln(ax) = ln(a) + ln(x).
(d) ln(xa) = aln(x).
(vii) In the loglog model, the slope coefficient indicates
(a) the effect that a unit change in X has on Y.
(b) the elasticity of Y with respect to X.
(c) ∆Y/∆X.
(d)
∆Y
∆X
×
Y
X
(viii) In the...
...Topic 8: MultipleRegression Answer
a.
Scatterplot
120 Game Attendance 100 80 60 40 20 0 0 5,000 10,000 15,000 20,000 25,000 Team Win/Loss %
There appears to be a positive linear relationship between team win/loss percentage and
game attendance. There appears to be a positive linear relationship between opponent win/loss percentage and game attendance.
There appears to be a positive linear relationship between games played and game
attendance. There does not appear to be any relationship between temperature and game attendance.
b. Game Attendance Game Attendance Team Win/Loss % Opponent Win/Loss % Games Played Temperature Team Win/Loss % Opponent Win/Loss % Games Played Temperature
1 0.848748849 1 0.414250332 0.286749997 1 0.599214835 0.577958172 0.403593506 1 0.476186226 0.330096097 0.446949168 0.550083219
1
No alpha level was specified. Students will select their own. We have selected .05. Critical t = + 2.1448 t for game attendance and team win/loss % = 0.8487/ (1 − 0.84872) /(16 − 2) = 6.0043 t for game attendance and opponent win/loss % = 0.4143/ (1 − 0.41432) /(16 − 2) = 1.7032 t for game attendance and games played = 0.5992/ (1 − 0.59922) /(16 − 2) = 2.8004 t for game attendance and temperature = 0.4762/ (1 − ( − 0.4762 ) ) /(16 − 2) = 2.0263 There is a significant relationship between game attendance and team win/loss % and games played. Therefore a multipleregression model...