Using Microsoft Excel

Correlation Analysis using Excel

The correlation coefficient allows researchers to determine if there is a possible linear relationship between two variables measured on the same subject (or entity). When these two variables are of a continuous nature (they are measurements such as weight, height, length, etc.) the measure of association most often used is Pearson’s correlation coefficient.

This association may be expressed as a number (the correlation coefficient) that ranges from –1 to +1. The population correlation is usually expressed as the Greek letter rho (r) and the sample statistic (correlation coefficient) is r.

The correlation measures how well a straight line fits through a scatter of points when plotted on an x – y axis. If the correlation is positive, it means that when one variable increases, the other tends to increase. If the correlation is negative, it means that when one variable increases, the other tends to decrease. When a correlation coefficient is close to +1 (or –1), it means that there is a strong correlation – the points are scattered along a straight line. For example, a correlation r = 0.7 may be considered strong. However, the closer a correlation coefficient gets to 0, the weaker the relationship, where the cloud (scatter) of points is not close to a straight line. For example, a correlation r = 0.1 might be considered weak. For scientific purposes, a t-test is utilized to determine if the correlation coefficient is “strong” or “significant” or not. This will be discussed later.

Assumptions: Before using the Pearson correlation coefficient as a measure of association, you should be aware of its assumptions and limitations. As mentioned earlier, this correlation coefficient measures a linear relationship. That is, the relationship between the two variables measures how close the two measurements form a straight line when plotted on an x-y chart. Therefore, it is important that data be graphed before the correlation is interpreted. For example, it is possible that data, when plotted, may show a curved relationship instead of a straight line. When this is the case, a Pearson correlation may not be the best measure of association. There are other conditions when a correlation coefficient may appear important, but when considered in light of a graph, is not a good measure of relationship. In the following graphs, all of them have a correlation coefficient of about 0.72, yet most do not fit the assumption of a linear relationship. To avoid misinterpreting a correlation, always accompany the calculation with a graph.

Another assumption of correlation is that the both of the variables (the measurements) be of continuous data measured on an interval/ratio scale. Data that are not continuous, such as categorical (i.e. hair color) or binomial (i.e., gender) data would not be acceptable. Also, each variable should be approximately normally distributed. For this example, we’ll look at the data set called EXAMPLE.XLS. The first few records are shown here: GROUP| AGE| TIME1| TIME2| TIME3| TIME4| STATUS|

A| 12| 22.3| 25.3| 28.2| 30.6| 5|

A| 11| 22.8| 27.5| 33.3| 35.8| 5|

B| 12| 22.8| 30.0| 32.8| 31.0| 4|

A| 12| 18.5| 26.0| 29.0| 27.9| 5|

B| 9| 19.5| 25.0| 25.3| 26.6| 5|

B| 11| 23.5| 28.8| 34.2| 35.6| 5|

C| 8| 22.6| 26.7| 28.0| 33.4| 3|

B| 8| 21.0| 26.7| 27.5| 29.5| 5|

Suppose you want to want to find the Pearson’s correlation between the variables TIME1 and TIME2. Note: Example 1 requires that you’ve installed the Analysis Toolpak (Tools/Addins/Analysis Toolpak) Example 1

Step1: To make this calculation select Tools/Data Analysis/Correlation… The following dialog box is displayed:

Step 2: In the input range textbox enter the range of the data (include the first row containing the variable name) or click on the data selection icon and mark the...