Correlation only indicates the degree and direction of relationship between two variables. It does not, necessarily connote a cause-effect relationship. Even when there are grounds to believe the causal relationship exits, correlation does not tell us which variable is the cause and which, the effect. For example, the demand for a commodity and its price will generally be found to be correlated, but the question whether demand depends on price or vice-versa; will not be answered by correlation.
The dictionary meaning of the ‘regression’ is the act of the returning or going back. The term ‘regression’ was first used by Francis Galton in 1877 while studying the relationship between the heights of fathers and sons.
“Regression is the measure of the average relationship between two or more variables in terms of the original units of data.”
The line of regression is the line, which gives the best estimate to the values of one variable for any specific values of other variables.
For two variables on regression analysis, there are two regression lines. One line as the regression of x on y and other is for regression of y on x. These two regression line show the average relationship between the two variables. The regression line of y on x gives the most probable value of y for given value of x and the regression line of x and y gives the most probable values of x for the given value of y. For perfect correlation, positive or negative i.e. for r= ±, the two lines coincide i.e. we will find only one straight line. If r=0, i.e. both the variance are independent then the two lines will cut each other at a right angle. In this case the two lines will be ║to x and y axis.
The Graph is given below:-
We restrict our discussion to linear relationships only that is the equations to be considered are 1- y=a+bx
In equation first x is called the independent variable and y the dependent variable. Conditional on the x value, the equations gives the variation of y. In other words ,it means that corresponding to each value of x ,there is whole conditional probability distribution of y.
Similar discussion holds for the equation second, where y acts as independent variable and x as dependent variable.
What purpose does regression line serve?
1- The first object is to estimate the dependent variable from known values of independent variable. This is possible from regression line.
2- The next objective is to obtain a measure of the error involved in using regression line for estimation.
3- With the help of regression coefficients we can calculate the correlation coefficient. The square of correlation coefficient (r), is called coefficient of determination, measure the degree of association of correlation that exits between two variables.
What is the difference between correlation and linear regression? Correlation and linear regression are not the same. Consider these differences: • Correlation quantifies the degree to which two variables are related. Correlation does not find a best-fit line (that is regression). You simply are computing a correlation coefficient (r) that tells you how much one variable tends to change when the other one does.
• With correlation you don't have to think about cause and effect. You simply quantify how well two variables relate to each other. With regression, you do have to think about cause and effect as the regression line is determined as the best way to predict Y from X.
• With correlation, it doesn't matter which of the two variables you call "X" and which you call "Y". You'll get the same correlation coefficient if you swap the two. With linear regression, the decision of which variable you call "X" and which you call "Y" matters a lot, as you'll get a different best-fit line if you swap the two. The line that best predicts Y from X is not the same as the line that predicts X from Y.