Correlation tests Using R

Correlation is when you are looking to determine the strength of the relationship between two numerical variables. R can carry out correlation via the cor() command, and there are three different sorts:

  • Pearson correlation – for where data are normally distributed.
  • Spearman’s Rank (or Rho) – for where data are non-parametric (not normally distributed).
  • Kendall’s Tau – for where data are non-parametric.

To carry pout the correlation you need two variables to compare:

cor(x, y, method = “pearson”)

 

The default is to use Pearson’s product moment but you can specify “spearman” or “kendall” to carry out the appropriate calculation.

> fw
count speed
Taw          9     2
Torridge    25     3
Ouse        15     5
Exe          2     9
Lyn         14    14
Brook       25    24
Ditch       24    29
Fal         47    34

 

> attach(fw)
> cor(count, speed)
[1] 0.7237206
> cor(count, speed, method = “spearman”)
[1] 0.5269556
> detach(fw)

 

Alternatively you can specify the variables “explicitly”:

> cor(fw$count, fw$speed, method = “kendall”)
[1] 0.4000661

 

If you have multiple variables you can get a correlation matrix by specifying the entire dataset:

> head(mf)
Length Speed Algae  NO3 BOD
1     20    12    40 2.25 200
2     21    14    45 2.15 180
3     22    12    45 1.75 135
4     23    16    80 1.95 120
5     21    20    75 1.95 110
6     20    21    65 2.75 120

 

> cor(mf)
Length       Speed      Algae         NO3        BOD
Length  1.0000000 -0.34322968  0.7650757  0.45476093 -0.8055507
Speed  -0.3432297  1.00000000 -0.1134416  0.02257931  0.1983412
Algae   0.7650757 -0.11344163  1.0000000  0.37706463 -0.8365705
NO3     0.4547609  0.02257931  0.3770646  1.00000000 -0.3751308
BOD    -0.8055507  0.19834122 -0.8365705 -0.37513077  1.0000000

 

The matrix shows the correlation of every variable with every other variable.

Correlation and Significance tests

The basic cor() function computes the strength (and direction) of the correlation but does not tell you if the relationship is statistically significant. You need the cor.test() command to carry out a statistical test.

> cor.test(fw$count, fw$speed)

 

Pearson’s product-moment correlation

 

data:  fw$count and fw$speed
t = 2.5689, df = 6, p-value = 0.0424
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.03887166 0.94596455
sample estimates:
cor
0.7237206

 

You can also specify the variables in a formula like so:

> cor.test(~ Length + Algae, data = mf, method = “spearman”)

 

Spearman’s rank correlation rho

 

data:  Length and Algae
S = 517.65, p-value = 1.517e-06
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.8009031

 

Notice that the formula starts with a tilde ~ and then you provide the two variables, separated with a +. This arrangement reinforces the notion that you are looking for a simple correlation and are not implying cause and effect (there is no response ~ predictor in the formula).

My Publications

I have written several books on ecology and data analysis

Statistics for Ecologists
Using R and Excel
£34.99
Beginning R: The Statistical
Programming Language
£26.99
The Essential R
Reference
£44.99
Community
Ecology
£39.99
Managing Data
Using Excel
£24.99

Register your interest for our Training Courses

We run training courses in data management, visualisation and analysis using Excel and R: The Statistical Programming Environment. Courses will be held at The Field Studies Council Field Centre at Slapton Ley in Devon. Alternatively we can come to you and provide the training at your workplace.




    Get In Touch Now

    for any information regarding our training courses, publications or help with a data project