Correlation tests Using R

Correlation is when you are looking to determine the strength of the relationship between two numerical variables. R can carry out correlation via the cor() command, and there are three different sorts:

  • Pearson correlation – for where data are normally distributed.
  • Spearman’s Rank (or Rho) – for where data are non-parametric (not normally distributed).
  • Kendall’s Tau – for where data are non-parametric.

To carry pout the correlation you need two variables to compare:

cor(x, y, method = "pearson")

The default is to use Pearson’s product moment but you can specify “spearman” or “kendall” to carry out the appropriate calculation.

> fw
         count speed
Taw          9     2
Torridge    25     3
Ouse        15     5
Exe          2     9
Lyn         14    14
Brook       25    24
Ditch       24    29
Fal         47    34

> attach(fw)
> cor(count, speed)
[1] 0.7237206

> cor(count, speed, method = "spearman")
[1] 0.5269556

> detach(fw)

Alternatively you can specify the variables “explicitly”:

> cor(fw$count, fw$speed, method = "Kendall")
[1] 0.4000661

If you have multiple variables you can get a correlation matrix by specifying the entire dataset:

> head(mf)
  Length Speed Algae  NO3 BOD
1     20    12    40 2.25 200
2     21    14    45 2.15 180
3     22    12    45 1.75 135
4     23    16    80 1.95 120
5     21    20    75 1.95 110
6     20    21    65 2.75 120
> cor(mf)
           Length       Speed      Algae         NO3        BOD
Length  1.0000000 -0.34322968  0.7650757  0.45476093 -0.8055507
Speed  -0.3432297  1.00000000 -0.1134416  0.02257931  0.1983412
Algae   0.7650757 -0.11344163  1.0000000  0.37706463 -0.8365705
NO3     0.4547609  0.02257931  0.3770646  1.00000000 -0.3751308
BOD    -0.8055507  0.19834122 -0.8365705 -0.37513077  1.0000000

The matrix shows the correlation of every variable with every other variable.

Correlation and Significance tests

The basic cor() function computes the strength (and direction) of the correlation but does not tell you if the relationship is statistically significant. You need the cor.test() command to carry out a statistical test.

> cor.test(fw$count, fw$speed)

Pearson’s product-moment correlation

data:  fw$count and fw$speed
t = 2.5689, df = 6, p-value = 0.0424
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.03887166 0.94596455
sample estimates:
cor
0.7237206

You can also specify the variables in a formula like so:

> cor.test(~ Length + Algae, data = mf, method = "spearman")

Spearman’s rank correlation rho

data:  Length and Algae
S = 517.65, p-value = 1.517e-06
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.8009031

Notice that the formula starts with a tilde ~ and then you provide the two variables, separated with a +. This arrangement reinforces the notion that you are looking for a simple correlation and are not implying cause and effect (there is no response ~ predictor in the formula).

Comments are closed.