Spearman Rank correlation in Excel

Exercise 8.3.2.

Statistics for Ecologists (Edition 2) Exercise 8.3.2

This exercise is concerned with correlation (Chapter 8) and in particular how you can use Excel to calculate Spearman’s Rank correlation coefficient.

Using Excel for Spearman’s Rank correlation

Introduction

Excel has built-in functions that can calculate correlation, but only when data are normally distributed. The CORREL and PEARSON functions both calculate Pearson’s Product Moment, a correlation coefficient. Once you have the correlation coefficient it is fairly easy to calculate the statistical significance. You compute a t-value then use TINV to compute a critical value or TDIST to get an exact p-value (see Section 8.3.1 in the book).

This exercise shows how you can use the RANK.AVG function to rank the data, then use CORREL on the ranks to obtain a Spearman Rank coefficient.

The data for this exercise are freshwater correlation.xlsx, which look like this:

Example data for practice with correlation. Abundance of invertebrates and water speed.

Abund Speed
12 18
18 12
17 12
14 7
9 8
7 9
6 6
7 11
5 6
3 2

 

These data represent the abundance of a freshwater invertebrate and water speed.

Pearson correlation

To get the regular Pearson correlation you can use the CORREL (or PEARSON) function. In the spreadsheet the data are in columns B and C. To get the correlation coefficient the function required is:

=CORREL(B2:B11, C2:C11)

The result (0.614) is the “regular” parametric correlation coefficient, which is based on the normal distribution. The PEARSON function gives an identical result (Pearson’s Product Moment). Regression is another term used for correlation with parametric (normally distributed or Gaussian) data. The correlation coefficient, r, between two normally distributed variables is calculated like so:

Once you have r you can determine a critical value or exact p-value.

Use Pearson’s correlation when you have normally distributed data and when you expect there to be a linear relationship between the two variables.

Spearman’s Rank correlation

Spearman’s Rank correlation is used to determine the strength of the relationship between two numerical variables. The data do not have to be normally distributed and the relationship does not have to be strictly linear either. The relationship should be one where the variables are generally headed in one direction though (so not U-shaped or the inverse).

As the name suggests, the correlation is based on the ranks of the data. The x and y variables are ranked and the ranks of x are compared to the ranks of y like so:

In the formula D is the difference between ranks, and n is the number of pairs of data.

There is no built-in function to work out Spearman Rank correlation but you can work out the ranks using RANK.AVG and calculate the terms in the formula easily enough.

Rank the data

The RANK.AVG function allows you to rank data (note the older RANK function is not the same). In statistical analysis you want the lowest value to have the smallest rank. Tied values should get a tied rank.

To try this out using the example data do the following:

  1. Open the freshwater correlation.xlsx The variables are in columns B & C.
  2. In cell D1 type a label, RA, for the heading of the Ranks of the Abund variable.
  3. In E1 type a label, RS, for the ranks of the Speed variable.
  4. Now in D2 type a formula to get the rank of the first datum (in cell B2): =RANK.AVG(B2,B$2:B$11,1). Note the $ used to “fix” the rows references. The final 1 ensures the ranks are ascending order.
  5. Copy the formula down the rest of the column to fill cells D2:D11.
  6. Copy the cells in D2:D11 across to column E to get ranks in cells E2:E11.

If you wanted to fill out the Spearman Rank formula “the long way” you would now have to subtract each rank in column E from its counterpart in column D. Then square the differences.

However, there is another (easier) way.

Compute Rs

Now you have ranks you can use the CORREL (or PEARSON) function on the ranks to get a close approximation of the Spearman Rank correlation coefficient. It is usually the same to at least 2 decimal places.

  1. In cell A13 type a label, Correl, for the correlation coefficient(s).
  2. In B13 type a formula to work out Pearson’s correlation coefficient for the original data: =CORREL(B2:B11,C2:C11). The result is 0.614 for the example data.
  3. In D13 type a formula to work out the correlation between the ranks (i.e. columns D & E): =CORREL(D2:D11,E2:E11). The result is 0.771.

The correlation between the ranks is a close approximation to the Spearman Rank coefficient (0.773) computed the “long way”.

Get a t-value

You can compare your calculated Spearman Rank coefficient to a table of critical values (e.g. Table 8.5 in the book) or compute a t-value (another approximation).

To get a value for t you need the correlation coefficient and the number of pairs of values. These then go into the following formula:

The df in the formula is degrees of freedom and is the number of pairs of data minus 2. In the example datafile:

  1. In cell A14 type a label, n, for the number of pairs of data.
  2. In B14 type a formula to work out the number of items you have: =COUNT(B2:B11). You get 10.
  3. In A15 type a label, df, for the degrees of freedom.
  4. In B15 type a formula to work out degrees of freedom: =B14-2. You get 8.
  5. In A16 type a label, t, for the t-value.
  6. In B16 type a formula to work out a t-value: =SQRT(D13^2*B15/(1-D13^2)). The result is 3.420.

Once you have a value for t you are on your way to determining the statistical significance.

Determine statistical significance

Now you have a value for t you can:

  • Get a critical value using the TINV function.
  • Get an exact probability (a p-value) using the TDIST function.

The TINV function requires the significance level (usually 0.05) and the degrees of freedom.

The TDIST function requires a t-value, the degrees of freedom and a 2 (for a two-tailed test, which is most often what you need).

  1. In cell A17 type a label, t-crit, for the critical value.
  2. In B17 type a formula to calculate the critical t value: =TINV(0.05,B15). The result is 2.306.
  3. In A18 type a label, p-value, for the exact probability.
  4. In B18 type a formula to calculate the exact p-value: =TDIST(B16,B15,2). The result is 0.009.

You can compare your calculated value of t to the critical value and simply say that your correlation is significant at p < 0.05 or you can use the exact p-value (of 0.009).

This method of calculating Spearman Rank gives a good approximation of the correlation coefficient but the p-values are a little conservative. This is the method that the base distribution of R uses in its cor.test() command when method=”spearman”.

Comments are closed.