Correlation is when you are looking to determine the strength of the relationship between two numerical variables. R can carry out correlation via the cor() command, and there are three different sorts:
- Pearson correlation – for where data are normally distributed.
- Spearman’s Rank (or Rho) – for where data are non-parametric (not normally distributed).
- Kendall’s Tau – for where data are non-parametric.
To carry pout the correlation you need two variables to compare:
cor(x, y, method = “pearson”)
The default is to use Pearson’s product moment but you can specify “spearman” or “kendall” to carry out the appropriate calculation.
> fw
count speed
Taw 9 2
Torridge 25 3
Ouse 15 5
Exe 2 9
Lyn 14 14
Brook 25 24
Ditch 24 29
Fal 47 34
> attach(fw)
> cor(count, speed)
[1] 0.7237206
> cor(count, speed, method = “spearman”)
[1] 0.5269556
> detach(fw)
Alternatively you can specify the variables “explicitly”:
> cor(fw$count, fw$speed, method = “kendall”)
[1] 0.4000661
If you have multiple variables you can get a correlation matrix by specifying the entire dataset:
> head(mf)
Length Speed Algae NO3 BOD
1 20 12 40 2.25 200
2 21 14 45 2.15 180
3 22 12 45 1.75 135
4 23 16 80 1.95 120
5 21 20 75 1.95 110
6 20 21 65 2.75 120
> cor(mf)
Length Speed Algae NO3 BOD
Length 1.0000000 -0.34322968 0.7650757 0.45476093 -0.8055507
Speed -0.3432297 1.00000000 -0.1134416 0.02257931 0.1983412
Algae 0.7650757 -0.11344163 1.0000000 0.37706463 -0.8365705
NO3 0.4547609 0.02257931 0.3770646 1.00000000 -0.3751308
BOD -0.8055507 0.19834122 -0.8365705 -0.37513077 1.0000000
The matrix shows the correlation of every variable with every other variable.
Correlation and Significance tests
The basic cor() function computes the strength (and direction) of the correlation but does not tell you if the relationship is statistically significant. You need the cor.test() command to carry out a statistical test.
> cor.test(fw$count, fw$speed)
Pearson’s product-moment correlation
data: fw$count and fw$speed
t = 2.5689, df = 6, p-value = 0.0424
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.03887166 0.94596455
sample estimates:
cor
0.7237206
You can also specify the variables in a formula like so:
> cor.test(~ Length + Algae, data = mf, method = “spearman”)
Spearman’s rank correlation rho
data: Length and Algae
S = 517.65, p-value = 1.517e-06
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.8009031
Notice that the formula starts with a tilde ~ and then you provide the two variables, separated with a +. This arrangement reinforces the notion that you are looking for a simple correlation and are not implying cause and effect (there is no response ~ predictor in the formula).
My Publications
I have written several books on ecology and data analysis
Register your interest for our Training Courses
We run training courses in data management, visualisation and analysis using Excel and R: The Statistical Programming Environment. Courses will be held at The Field Studies Council Field Centre at Slapton Ley in Devon. Alternatively we can come to you and provide the training at your workplace.
Get In Touch Now
for any information regarding our training courses, publications or help with a data project