Regression analysis is a commonly used tool for companies to make predictions based on certain variables. Even though it is very common there are still limitations that arise when producing the regression, which can skew the results.

The Number of Variables:
The first limitation that we noticed in our regression model is the number of variables that we used. The more companies that you have to compare the greater the chance your model will be significant. We have found that one needs 10-20 times the companies as the variables that are being used (STATSOFT). Our regression was at the low end of this suggestion. We believe that if we had used at least 20-30 more companies our regression line could have been more accurate than what concluded. This was a hard number to come up with because it was found that many of the clothing companies we had were under a parent company. This grouped many of our first choices together under one company. Also, many of the clothing companies are private, therefore limiting the choices we had to find. Since we used the three variable model our limited number of companies made for a better choice than the four or five variable models, as we would have needed more companies to make a more accurate regression line.

Multicollinearity:
Multicollinearity is a limitation problem that is very difficult to avoid. This is known to happen when data located in the x variables are related. When multicollinearity occurs it can cause major problems on the quality and stability of ones final model. (UNESCO.ORG). This was not a huge problem in all of our categories. We did find out that was a severe multicollinearity that occurred between SG&A and the operating income. Because of this we determined that the five variable model would not work for our regression.

Outliers:
When one is comparing a variety of companies it is discovered that there is a difference in size from one company to another, even...

...for new house or automobile is very much affected by the interest rates changed by banks.
Regression analysis is one such causal method. It is not limited to locating the straight line of best fit.
Types:-
1. Simple (or Bivariate) Regression Analysis:
Deals with a Single independent variable that determines the value of a dependent variable.
Ft+1 = f (x) t Where Ft+1: the forecast for the next period.
This indicates the future demand is a function of the value of the economic indicator at the
present time.
Demand Function: D=a+bP, where b is negative.
If we assume there is a linear relation between D and P, there may also be some random variation in this relation.
Sum of Squared Errors (SSE): This is a measure of the predictive accuracy. Smaller the value of SSE, the more accurate is there regression equation
EXAMPLE:-
Following data on the demand for sewing machines manufactured by Taylor and Son
Co. have been compiled for the past 10 years.
YEAR | 1971 | 1972 | 1973 | 1974 | 1975 | 1976 | 1977 | 1978 | 1979 | 1980 |
DEMAND (in 1000 Units) | 58 | 65 | 73 | 76 | 78 | 87 | 88 | 93 | 99 | 106 |
1. Single variable linear regression
Year = x where x = 1, 2, 3... 10
Demand = y
D = y + ᵋ Where D is actual demand
ᵋ = D –y
To find out whether this is the line of best fitted or not it is to be made sure that this sum of squares is minimum.
2. Nonlinear Regression Analysis...

...
Logistic regression
In statistics, logistic regression, or logit regression, is a type of probabilistic statistical classification model.[1] It is also used to predict a binary response from a binary predictor, used for predicting the outcome of acategorical dependent variable (i.e., a class label) based on one or more predictor variables (features). That is, it is used in estimating the parameters of a qualitative response model. The probabilities describing the possible outcomes of a single trial are modeled, as a function of the explanatory (predictor) variables, using a logistic function. Frequently (and subsequently in this article) "logistic regression" is used to refer specifically to the problem in which the dependent variable is binary—that is, the number of available categories is two—while problems with more than two categories are referred to as multinomial logistic regression or, if the multiple categories are ordered, as ordered logistic regression.
Logistic regression measures the relationship between a categorical dependent variable and one or more independent variables, which are usually (but not necessarily) continuous, by using probability scores as the predicted values of the dependent variable.[2] As such it treats the same set of problems as doesprobit regression using similar techniques.
Fields and examples of applications[edit]...

...Regression Analysis: A Complete Example
This section works out an example that includes all the topics we have discussed so far in this chapter.
A complete example of regression analysis.
PhotoDisc, Inc./Getty Images
A random sample of eight drivers insured with a company and having similar auto insurance policies was selected. The following table lists their driving experiences (in years) and monthly auto insurance premiums.
Driving Experience (years) Monthly Auto Insurance Premium
5 2 12 9 15 6 25 16
$64 87 50 71 44 56 42 60
a. Does the insurance premium depend on the driving experience or does the driving experience depend on the insurance premium? Do you expect a positive or a negative relationship between these two variables? b. Compute SSxx, SSyy, and SSxy. c. Find the least squares regression line by choosing appropriate dependent and independent variables based on your answer in part a. d. Interpret the meaning of the values of a and b calculated in part c. e. Plot the scatter diagram and the regression line. f. Calculate r and r2 and explain what they mean. g. Predict the monthly auto insurance premium for a driver with 10 years of driving experience. h. Compute the standard deviation of errors. i. Construct a 90% confidence interval for B. j. Test at the 5% significance level whether B is negative. k. Using α = .05, test whether ρ is different from zero.
Solution a. Based on theory and intuition, we...

...Applied Linear Regression Notes set 1
Jamie DeCoster
Department of Psychology
University of Alabama
348 Gordon Palmer Hall
Box 870348
Tuscaloosa, AL 35487-0348
Phone: (205) 348-4431
Fax: (205) 348-8648
September 26, 2006
Textbook references refer to Cohen, Cohen, West, & Aiken’s (2003) Applied Multiple Regression/Correlation
Analysis for the Behavioral Sciences. I would like to thank Angie Maitner and Anne-Marie Leistico for
comments made on earlier versions of these notes. If you wish to cite the contents of this document, the
APA reference for them would be:
DeCoster, J. (2006). Applied Linear Regression Notes set 1. Retrieved (month, day, and year you
downloaded this ﬁle, without the parentheses) from http://www.stat-help.com/notes.html
For future versions of these notes or help with data analysis visit
http://www.stat-help.com
ALL RIGHTS TO THIS DOCUMENT ARE RESERVED
Contents
1 Introduction and Review
1
2 Bivariate Correlation and Regression
9
3 Multiple Correlation and Regression
21
4 Regression Assumptions and Basic Diagnostics
29
5 Sequential Regression, Stepwise Regression, and Analysis of IV Sets
37
6 Dealing with Nonlinear Relationships
45
7 Interactions Among Continuous IVs
51
8 Regression with Categorical IVs
59
9 Interactions involving Categorical IVs
69...

...Determinants of Production and Consumptions
Determinants of Industry Production (Supply)
Supply is the amount of output of production that producers are willing and able to sell at a given price all other factors being held constant.
The following are the determinants of supply:
Price (P), Numbers of Producers (NP), Taxes (T)
Model Specification
Specification of model is to specify the form of equation, or regression relation that indicates the relationship between the independent variables and the dependent variables. Normally the specific functional form of the regression relation to be estimated is chosen to depict the true supply relationships as closely possible.
The table presented below gives the hypothetical quantity supplied for a particular product (Qs) of a particular place given its price per kilo (P/kl), the Numbers of producers (NP), and tax per kilo (T/kl) for the period 2002 to 2011. (The quantity Supplied is expressed as kilo in millions)
Table
|Year |Qs |P/kl |NP |T/kl |
|2002 |21.4 |23 |39 |1.15 |
|2003 |23.9 |25 |41 |1.25 |
|2004...

...associated with a β1 change in Y.
(iii) The interpretation of the slope coefficient in the model ln(Yi ) = β0 + β1 ln(Xi ) + ui is as
follows:
(a) a 1% change in X is associated with a β1 % change in Y.
(b) a change in X by one unit is associated with a β1 change in Y.
(c) a change in X by one unit is associated with a 100β1 % change in Y.
(d) a 1% change in X is associated with a change in Y of 0.01β1 .
(iv) To decide whether Yi = β0 + β1 X + ui or ln(Yi ) = β0 + β1 X + ui fits the data better, you
cannot consult the regression R2 because
(a) ln(Y) may be negative for 0 < Y < 1.
(b) the TSS are not measured in the same units between the two models.
(c) the slope no longer indicates the effect of a unit change of X on Y in the log-linear
model.
(d) the regression R2 can be greater than one in the second model.
1
(v) The exponential function
(a) is the inverse of the natural logarithm function.
(b) does not play an important role in modeling nonlinear regression functions in econometrics.
(c) can be written as exp(ex ).
(d) is ex , where e is 3.1415...
(vi) The following are properties of the logarithm function with the exception of
(a) ln(1/x) = −ln(x).
(b) ln(a + x) = ln(a) + ln(x).
(c) ln(ax) = ln(a) + ln(x).
(d) ln(xa) = aln(x).
(vii) In the log-log model, the slope coefficient indicates
(a) the effect that a unit change in X has on Y.
(b) the elasticity of Y with respect to X.
(c) ∆Y/∆X.
(d)
∆Y
∆X
×
Y
X
(viii) In the...

...l
Regression Analysis
Basic Concepts & Methodology
1. Introduction
Regression analysis is by far the most popular technique in business and economics for
seeking to explain variations in some quantity in terms of variations in other quantities, or to
develop forecasts of the future based on data from the past. For example, suppose we are
interested in the monthly sales of retail outlets across the UK. An initial data analysis would
summarise the variability in terms of a mean and standard deviation, but the variation from
outlet to outlet could be very large for a variety of reasons. The size of the local market, the
size of the shop, the level of competition, the level of advertising, etc.. would all influence the
sales volume from outlet to outlet. This is where regression analysis can be useful. A
regression analysis would seek to model the influence of these factors on the level of sales. In
statistical terms we would be seeking to regress the variation in sales ⎯ the dependent
variable ⎯ upon several explanatory variables such as advertising, size, etc..
From a forecasting point of view we can use regression analysis to develop predictions. If we
were asked to make a forecast for the monthly sales of a proposed new outlet in, say, Oxford,
we can simply compute the average outlet sales and put this forward as our prediction ⎯ i.e.
ignoring specific characteristics of the Oxford...

...Linear -------------------------------------------------
Important
EXERCISE 27 SIMPLE LINEAR REGRESSION
STATISTICAL TECHNIQUE IN REVIEW
Linear regression provides a means to estimate or predict the value of a dependent variable based on the value of one or more independent variables. The regression equation is a mathematical expression of a causal proposition emerging from a theoretical framework. The linkage between the theoretical statement and the equation is made prior to data collection and analysis. Linear regression is a statistical method of estimating the expected value of one variable, y, given the value of another variable, x. The term simple linear regression refers to the use of one independent variable, x, to predict one dependent variable, y.
The regression line is usually plotted on a graph, with the horizontal axis representing x (the independent or predictor variable) and the vertical axis representing the y (the dependent or predicted variable) (see Figure 27-1). The value represented by the letter a is referred to as the y intercept or the point where the regression line crosses or intercepts the y-axis. At this point on the regression line, x = 0. The value represented by the letter b is referred to as the slope, or the coefficient of x. The slope determines the direction and angle of the regression line within the graph....