Use Excel for Wilcoxon matched pairs
Exercise 7.3.3.
Statistics for Ecologists (Edition 2) Exercise 7.3.3
This exercise is concerned with matched pairs tests (Section 7.3) and in particular how to carry out the non-parametric Wilcoxon Matched Pairs test using Excel (Section 7.3.3).
Use Excel for Wilcoxon matched pairs test
Introduction
There is no in-built function that will carry out the Wilcoxon Matched Pairs test in Excel. However, you can rank the data and compute the rank sums you require using the RANK.AVG function.
You do need to omit zero differences from the calculations and also to separate ranks of positive differences from ranks of negative differences. In this exercise you can see how to use the IF function to help you do this separation.
The example data
You can get the sample data here: Wilcoxon.xlsx. There are two worksheets, one has the data only and the other has a completed set of calculations so you can check your progress.
Here’s what the data look like:
Matched pairs data for use in Wilcoxon matched pairs analysis
Obs | A | B |
1 | 8 | 11 |
2 | 6 | 8 |
3 | 12 | 4 |
4 | 2 | 9 |
5 | 3 | 3 |
6 | 3 | 5 |
7 | 1 | 2 |
8 | 7 | 2 |
You can see that there are 8 pairs of data.
Show the direction of the difference
Start by making a column to show the direction of the difference. Make a heading in cell D1, Dir will do as a heading name.
Now in cell D2 type a formula to show if the difference between B2 and C2 is positive or negative. You’re going to subtract the second column from the first. You also want to take into account possible zero differences:
=IF(B2<C2, "-", IF(B2=C2, "", "+"))
So if cell C2 is larger than cell B2 you’ll get a minus symbol. If the cells are equal, then you get a blank. If B2 is larger than C2 you’ll get a + symbol. You will use these symbols to separate the ranks later.
Copy the formula down the rest of the column to show the direction of each of the differences.
The final column shows the direction of differences between A and B (i.e. A-B).
Obs | A | B | Dir |
1 | 8 | 11 | – |
2 | 6 | 8 | – |
3 | 12 | 4 | + |
4 | 2 | 9 | – |
5 | 3 | 3 | |
6 | 3 | 5 | – |
7 | 1 | 2 | – |
8 | 7 | 2 | + |
Note that observation 5 (row 6 of the spreadsheet) shows a blank because the items are the same (i.e. a zero difference).
Calculate the differences
Make a column for the difference between samples, make the heading in cell E1, D(A-B) or something similar.
You want to subtract the values in the second column of data from the first column of data (i.e. Col B – Col C). So in cell E2 type a formula to do that. You’ll need to omit any zero difference, so use an IF function to place a blank “” if the difference is zero:
=IF(B2-C2=0, "", ABS(B2-C2))
You are not interested in the sign of any difference, just the magnitude, so the ABS function is needed.
Copy the result down the rest of the column.
The final column shows the absolute magnitude of differences between A and B.
Obs | A | B | Dir | D(A-B) |
1 | 8 | 11 | – | 3 |
2 | 6 | 8 | – | 2 |
3 | 12 | 4 | + | 8 |
4 | 2 | 9 | – | 7 |
5 | 3 | 3 | ||
6 | 3 | 5 | – | 2 |
7 | 1 | 2 | – | 1 |
8 | 7 | 2 | + | 5 |
You can see that you have 7 values, with one blank (a zero difference, observation 5).
Calculate ranks of differences
Now you want to work out the ranks of the differences. That is the ranks of the absolute value of the differences that you just worked out (column E) and called D(A-B).
Make a new column label in cell F1, call it Rd or something similar.
Now in cell F2 type a formula to work out the ranks of the items from column E:
=IF(E2="", "", RANK.AVG(E2,$E$2:$E$9,1))
Note that you need to take care of any possible blank cells (corresponding to zero differences). You also need to “fix” the cell range E2:E9 using $ since you will be copying the cell down the rest of the column. The final 1 makes sure that your ranks are sorted ascending order, with the smallest difference getting the smallest rank.
The final column shows the rank of the (absolute) differences between A and B.
Obs | A | B | Dir | D(A-B) | Rd |
1 | 8 | 11 | – | 3 | 4 |
2 | 6 | 8 | – | 2 | 2.5 |
3 | 12 | 4 | + | 8 | 7 |
4 | 2 | 9 | – | 7 | 6 |
5 | 3 | 3 | |||
6 | 3 | 5 | – | 2 | 2.5 |
7 | 1 | 2 | – | 1 | 1 |
8 | 7 | 2 | + | 5 | 5 |
You can see that there are some tied ranks (each given a value of 2.5).
Ranks of + and of – differences
Now you need to separate the ranks due to the positive differences and the ranks due to negative differences. Mae two more column headings in cells G1 and H1, R+ and R- will do fine.
In cell G2 type a formula that shows the rank if it is due to a positive difference but leaves the cell blank is not:
=IF(D2="+", F2,"")
Copy the cell down the rest of the column G.
In cell H2 type a formula that shows the rank if it is due to a negative difference but leaves the cell blank is not:
=IF(D2="-", F2,"")
Copy the cell down the rest of the column H.
The last two columns show the ranks due to positive differences and negative differences.
Obs | A | B | Dir | D(A-B) | Rd | R+ | R- |
1 | 8 | 11 | – | 3 | 4 | 4 | |
2 | 6 | 8 | – | 2 | 2.5 | 2.5 | |
3 | 12 | 4 | + | 8 | 7 | 7 | |
4 | 2 | 9 | – | 7 | 6 | 6 | |
5 | 3 | 3 | |||||
6 | 3 | 5 | – | 2 | 2.5 | 2.5 | |
7 | 1 | 2 | – | 1 | 1 | 1 | |
8 | 7 | 2 | + | 5 | 5 | 5 |
You should now see the ranks split according to the sign of the difference between the samples.
Rank sums
Now you need to add up the ranks for the positive and negative differences (columns G & H). You can use the SUM function for this.
In cells A11 and A12 type labels for the counts and sums, n and Sum, will do nicely.
In cell B11 type a formula to work out the number of observations:
=COUNT(B2:B9)
Copy this into cells C11 and F11. The latter being the number of non-zero differences.
In cell B12 type a formula to sum the data in column 12:
=SUM(B2:B9)
Copy the cell into cells C12, F12, G12 and H12. Cells F12:H12 contain the ranks sums; overall, +ve and -ve. Note that the overall sum of ranks should equal the two other rank sums combined.
The final two rows show some summary (count and sum). The test statistic is the smaller of the two rank sums (12 & 16).
Obs | A | B | Dir | D(A-B) | Rd | R+ | R- |
1 | 8 | 11 | – | 3 | 4 | 4 | |
2 | 6 | 8 | – | 2 | 2.5 | 2.5 | |
3 | 12 | 4 | + | 8 | 7 | 7 | |
4 | 2 | 9 | – | 7 | 6 | 6 | |
5 | 3 | 3 | |||||
6 | 3 | 5 | – | 2 | 2.5 | 2.5 | |
7 | 1 | 2 | – | 1 | 1 | 1 | |
8 | 7 | 2 | + | 5 | 5 | 5 | |
n | 8 | 8 | 7 | ||||
∑ | 42 | 44 | 28 | 12 | 16 |
Final result
The two rank sums are 12 and 16 so the 12 is the test statistic, W. You’ll need the number of non-zero differences to look up the appropriate critical value (see Table 7.13 in the book).
For Nd = 7 the critical value is 2. The calculated value of W is 12, which is larger than the critical value so the result is not significant.
Comments are closed.