Correlation tests Using R
Correlation is when you are looking to determine the strength of the relationship between two numerical variables. R can carry out correlation via the cor() command, and there are three different sorts:
- Pearson correlation – for where data are normally distributed.
- Spearman’s Rank (or Rho) – for where data are non-parametric (not normally distributed).
- Kendall’s Tau – for where data are non-parametric.
To carry pout the correlation you need two variables to compare:
cor(x, y, method = "pearson")
The default is to use Pearson’s product moment but you can specify “spearman” or “kendall” to carry out the appropriate calculation.
> fw count speed Taw 9 2 Torridge 25 3 Ouse 15 5 Exe 2 9 Lyn 14 14 Brook 25 24 Ditch 24 29 Fal 47 34 > attach(fw) > cor(count, speed) [1] 0.7237206 > cor(count, speed, method = "spearman") [1] 0.5269556 > detach(fw)
Alternatively you can specify the variables “explicitly”:
> cor(fw$count, fw$speed, method = "Kendall") [1] 0.4000661
If you have multiple variables you can get a correlation matrix by specifying the entire dataset:
> head(mf) Length Speed Algae NO3 BOD 1 20 12 40 2.25 200 2 21 14 45 2.15 180 3 22 12 45 1.75 135 4 23 16 80 1.95 120 5 21 20 75 1.95 110 6 20 21 65 2.75 120
> cor(mf) Length Speed Algae NO3 BOD Length 1.0000000 -0.34322968 0.7650757 0.45476093 -0.8055507 Speed -0.3432297 1.00000000 -0.1134416 0.02257931 0.1983412 Algae 0.7650757 -0.11344163 1.0000000 0.37706463 -0.8365705 NO3 0.4547609 0.02257931 0.3770646 1.00000000 -0.3751308 BOD -0.8055507 0.19834122 -0.8365705 -0.37513077 1.0000000
The matrix shows the correlation of every variable with every other variable.
Correlation and Significance tests
The basic cor() function computes the strength (and direction) of the correlation but does not tell you if the relationship is statistically significant. You need the cor.test() command to carry out a statistical test.
> cor.test(fw$count, fw$speed) Pearson’s product-moment correlation data: fw$count and fw$speed t = 2.5689, df = 6, p-value = 0.0424 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.03887166 0.94596455 sample estimates: cor 0.7237206
You can also specify the variables in a formula like so:
> cor.test(~ Length + Algae, data = mf, method = "spearman") Spearman’s rank correlation rho data: Length and Algae S = 517.65, p-value = 1.517e-06 alternative hypothesis: true rho is not equal to 0 sample estimates: rho 0.8009031
Notice that the formula starts with a tilde ~ and then you provide the two variables, separated with a +. This arrangement reinforces the notion that you are looking for a simple correlation and are not implying cause and effect (there is no response ~ predictor in the formula).
Comments are closed.