When you have more than two samples to compare your go-to method of analysis would generally be analysis of variance (see 15). However, if your data are not normally distributed you need a non-parametric method of analysis. The Kruskal-Wallis test is the test to use in lieu of one-way anova.

## Kruskal Wallis test

The kruskal.test() command will carry out the Kruskal Wallis test for you.

> flies

C F F.G G S

1 75 58 58 57 62

2 67 61 59 58 66

3 70 56 58 60 65

4 75 58 61 59 63

5 65 57 57 62 64

6 71 56 56 60 62

7 67 61 58 60 65

8 67 60 57 57 65

9 76 57 57 59 62

10 68 58 59 61 67

> kruskal.test(flies)

Kruskal-Wallis rank sum test

data: flies

Kruskal-Wallis chi-squared = 38.437, df = 4, p-value = 9.105e-08

In this case you see that the data contains five columns, each being a separate sample. If your data are in separate named objects you can still run the test but you bundle the data in a list():

> Grass ; Heath ; Arable

[1] 3 4 3 5 6 12 21 4 5 4 7 8

[1] 6 7 8 8 9 11 12 11 NA NA NA NA

[1] 19 3 8 8 9 11 12 11 9 NA NA NA

> kruskal.test(list(Grass, Heath, Arable))

Kruskal-Wallis rank sum test

data: list(Grass, Heath, Arable)

Kruskal-Wallis chi-squared = 6.0824, df = 2, p-value = 0.04778

### Kruskal Wallis and stacked data

The kruskal.test() function is also able to deal with data that are in a different form.

> fly

size diet

1 75 C

2 67 C

3 70 C

4 75 C

5 65 C

6 71 C

In this case one column gives the response variable and the other gives the (grouping) predictor variable.

> attach(fly)

> kruskal.test(size, g = diet)

Kruskal-Wallis rank sum test

data: size and diet

Kruskal-Wallis chi-squared = 38.437, df = 4, p-value = 9.105e-08

> detach(fly)

Note that attach() was used here to allow R to “read” the two columns from the data. More conveniently you can use a formula to describe the situation. This means you do not need to use attach().

> bfs

count site

1 3 Grass

2 4 Grass

3 3 Grass

4 5 Grass

5 6 Grass

6 12 Grass

> kruskal.test(count ~ site, data = bfs)

Kruskal-Wallis rank sum test

data: count by site

Kruskal-Wallis chi-squared = 6.0824, df = 2, p-value = 0.04778

A formula is a powerful tool in R and is used in many functions. You put the name of the response variable first then a ~ followed by the predictor, finally you add the data part that R knows where to find the variables.

### Post Hoc test for Kruskal Wallis

The Kruskal Wallis test will give you an overall result (both the examples show significant differences between groups). However, you will want to know about the details of the differences between groups. To do this you carry out a post hoc test. This means that you take the groups and compare them pair by pair. In this way you explore the data more fully and can describe exactly how each group relates to the others.

You cannot simply conduct several two-sample tests, multiple U-tests for example, as you increase your chances of getting a significant result by pure chance. For parametric analyses there are “standard” methods for taking this multiple testing into account. However, for Kruskal Wallis there is no consensus about the “right” method!

The simplest method is to carry out regular U-tests but correct for the use of multiple analyses. Such corrections are often called “Bonferroni” corrections, although there are other methods of correction. The pairwise.wilcox.test() function can conduct pairwise U-tests and correct the p-values for you.

You need to specify the data and the grouping:

> head(fly, n = 4)

size diet

1 75 C

2 67 C

3 70 C

4 75 C

> pairwise.wilcox.test(fly$size, g = fly$diet)

Pairwise comparisons using Wilcoxon rank sum test

data: fly$size and fly$diet

C F F.G G

F 0.0017 – – –

F.G 0.0017 1.0000 – –

G 0.0017 0.4121 0.2507 –

S 0.0027 0.0017 0.0017 0.0017

P value adjustment method: holm

The command carries out multiple U-tests and adjusts the p-values. In this case the “holm” method of adjustment was used (the default). You can select others:

> head(bfs)

count site

1 3 Grass

2 4 Grass

3 3 Grass

4 5 Grass

5 6 Grass

6 12 Grass

> pairwise.wilcox.test(bfs$count, g = bfs$site, p.adjust.method = “bonferroni”)

Pairwise comparisons using Wilcoxon rank sum test

data: bfs$count and bfs$site

Arable Grass

Grass 0.16 –

Heath 1.00 0.11

P value adjustment method: bonferroni

Try help(p.adjust.methods) from R for more information about the different adjustment methods.

## Friedman test

The Friedman test is essentially a 2-way analysis of variance used on non-parametric data. The test only works when you have completely balanced design. You can also use Friedman for one-way repeated measures types of analysis.

Here is an example of a data file which has a 2-way unreplicated block design.

> survey

count month year

1 2 1 2004

2 48 1 2005

3 40 1 2006

4 3 2 2004

5 120 2 2005

6 81 2 2006

7 2 3 2004

8 16 3 2005

9 36 3 2006

10 7 4 2004

11 21 4 2005

12 17 4 2006

13 2 5 2004

14 14 5 2005

15 17 5 2006

What you have here are data on surveys of amphibians. The first column (count) represents the number of individuals captured. The final column is the year that the survey was conducted. The middle column (month) shows that for each year there were 5 survey events in each year. What you have here is a replicated block design. Each year is a “treatment” (or “group”) whilst the month variable represents a “block”. This is a common sort of experimental design; the blocks are set up to take care of any possible variation and to provide replication for the treatment. In this instance you wish to know if there is any significant difference due to year.

The Friedman test allows you to carry out a test on these data. You need to determine which variable is the group and which the block. The friedman.test() function allows you to perform the test. One way is to specify the groups and the blocks like so:

> attach(survey)

> friedman.test(count, groups = year, blocks = month)

Friedman rank sum test

data: count, year and month

Friedman chi-squared = 7.6, df = 2, p-value = 0.02237

> detach(survey)

The other method is using a formula syntax:

> friedman.test(count ~ year | month, data= survey)

Here the vertical bar “splits” the predictor variables into group | block. It is important to get the group and block variables the correct way around!

### Post Hoc testing for Friedman tests

There is a real drop in statistical power when using a Friedman test compared to anova. Although there are methods that enable post-hoc tests, the power is such that obtaining significance is well-nigh impossible in most cases.

You can try pairwise.wilcox.test() with various groupings, e.g.:

> pairwise.wilcox.test(survey$count, g = survey$year)

Pairwise comparisons using Wilcoxon rank sum test

data: survey$count and survey$year

2004 2005

2005 0.033 –

2006 0.033 0.834

P value adjustment method: holm

In this analysis any other grouping is impossible (as there is no replication).

## My Publications

I have written several books on ecology and data analysis

## Register your interest for our Training Courses

We run training courses in data management, visualisation and analysis using Excel and R: The Statistical Programming Environment. Courses will be held at The Field Studies Council Field Centre at Slapton Ley in Devon. Alternatively we can come to you and provide the training at your workplace.

## Get In Touch Now

for any information regarding our training courses, publications or help with a data project