BY

NILOY MAJUMDAR

Table of Contents

1. INTRODUCTION

2. BIVARIATE DATA

3. ASSOCIATION AND CORRELATION

4. DEFINITION AND CALCULATION

5. RELATED QUANTITIES

6. INTERPRETATION

7. EXAMPLE

8. PEARSON’S PRODUCT-MOMENT CORRELATION COEFFICIENT 9. DETERMINING SIGNIFICANCE

10. CORRESPONDENCE ANALYSIS BASED ON SPEARMAN’S rho

11. REFERENCES

1. Introduction

Rank correlation is used quite extensively in school subjects other than mathematics, particularly geography and biology. There are two accepted measures of rank correlation, Spearman’s and Kendall’s; of these, Spearman’s is the more widely used. In statistics, Spearman's rank correlation coefficient or Spearman's rho, named after Charles Spearman and often denoted by the Greek letter (rho) or as , is a non-parametric measure of statistical dependence between two variables. It assesses how well the relationship between two variables can be described using a monotonic function. If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other. Spearman's coefficient can be used when both dependent (outcome; response) variable and independent (predictor) variable are ordinal numeric, or when one variable is a ordinal numeric and the other is a continuous variable. However, it can also be appropriate to use Spearman's correlation when both variables are continuous.

2. Bivariate Data

The data referred to in this paper are all bivariate. So each data item is reported in terms of the values of two attributes. These could, for example, be the heights and weights of 11-year old girls. In keeping with common convention, the two variables are referred to separately as X, with sample values , and Y, with sample values , or together as the bivariate distribution(X,Y) with sample values . A general bivariate item is denoted by . Sample size is denoted by n.

3. Association and Correlation

The terms association and correlation are often used interchangeably; this is not correct. There is association between two variables if knowing the value of one provides information about the likely value of the other. There is correlation between the variables if the association is linear; this can be represented by a straight line on a scatter diagram. These situations are illustrated in Figures 1 and 2. People sometimes describe situations like that in Figure 2 as “non-linear correlation” but this is technically incorrect; the correct description would be “non-linear association” or just “association”.

In the linear case, the strength of the association can be measured by the correlation coefficient; the closer the points to the straight line, the stronger is the correlation. A common mistake is to think that the steeper the line the better the correlation but this is not the case.

4. Definition and Calculation

The Spearman correlation coefficient is defined as the Pearson correlation coefficient between the ranked variables. For a sample of size n, the n raw scores are converted to ranks , and ρ is computed from these:

Tied values are assigned a rank equal to the average of their positions in the ascending order of the values. In the table below, notice how the rank of values that are the same is the mean of what their ranks would otherwise be:

Variable | Position in the descending order| Rank |

0.8| 5| 5|

1.2| 4| |

1.2| 3| |

2.3| 2| 2|

18| 1| 1|

In applications where ties are known to be absent, a simpler procedure can be used to calculate ρ. Differences between the ranks of each observation on the two variables are calculated, and ρ is given by:

5. Related Quantities

There are several other numerical measures that quantify the extent of statistical dependence between pairs of observations. The most common of these is...