Ordering boxes in an R boxplot()
Exercise 6.3.2.
Statistics for Ecologists (Edition 2) Exercise 6.3.2
These notes concern box-whisker plots and in particular how you can rearrange the order of the boxes in such plots.
Ordering boxes in an R boxplot()
Introduction
The boxplot() command is one of the most useful graphical commands in R. The box-whisker plot is useful because it shows a lot of information concisely. However, the boxes do not always appear in the order you would prefer. These notes show you how you can take control of the ordering of the boxes in a boxplot().
There are four main methods, which in turn depend on the layout of the data:
- Use order() to select column order when you have separate samples (i.e. vectors, columns in a data.frame or a list).
- Use [row, column] to select an explicit column order when you have separate samples.
- Use reorder() to change the order of a factor variable according to a function (e.g. mean), when you have response and predictor variables.
- Use ordered() to make a custom ordered factor variable when you have response and predictor variables.
There are subtle differences between these methods but essentially you are creating an index, which you can use in the boxplot() command to control the order the boxes appear in the plot.
Data in sample format
If your data are arranged as samples in a data.frame (or matrix) you can use boxplot() to plot the data in “one go”. The order of the boxes will depend on the order of the columns.
hog3 Upper Mid Lower 1 3 4 11 2 4 3 12 3 5 7 9 4 9 9 10 5 8 11 11 6 10 NA NA 7 9 NA NA boxplot(hog3)
You can specify an explicit order for the columns using column numbers:
boxplot(hog3[, 3:1])
The boxplot on the left uses the default column order. The boxplot on the right uses an explicit order x[, columns].
Note the [row, column] syntax to specify the order for plotting.
Order columns by a function
Rather than give an explicit order you may want to have the boxplot appear in order of some function (e.g. mean or median). You can use the order() command to arrange items in ascending (or descending) order. To proceed use these general steps:
- Use a command that gives you the values you require e.g. colMeans(), apply().
- Use the result from step 1 and make an order()
- Use the result of step 2 to define the order of the columns in the boxplot().
The apply() command is most flexible:
m <- apply(hog3, MARGIN = 2, FUN = median, na.rm = TRUE) m Upper Mid Lower 8 7 11
Now you can set an order based on the medians you calculated:
o <- order(m, decreasing = FALSE) o [1] 2 1 3
Use the x[row, column] syntax like before but use your calculated order:
boxplot(hog3[, o])
If you want decreasing order setdecreasing = TRUE.
Data in a list
If your data are in a list you can use the same principles but need a slightly modified procedure:
hogl = list(U = hog3$Upper, M = hog3$Mid, L = hog3$Lower) hogl $U [1] 3 4 5 9 8 10 9 $M [1] 4 3 7 9 11 NA NA $L [1] 11 12 9 10 11 NA NA
Use the lapply() command to work out the median over the list elements.
m <- lapply(hogl, median, na.rm = TRUE)
If you try to order() the result you get an error, so you must unlist() the result first:
order(unlist(m)) [1] 2 1 3
Now save the new order and use it in the plot.
o <- order(unlist(m)) boxplot(hogl[o])
Note that you don’t use [row, column] for the list, just give [element], as the list is one-dimensional.
Data in scientific recording layout
When your data are in scientific recording format you will have a column for each variable and will have response variables and predictor variables e.g.
hog2 count site 1 3 Upper 2 4 Upper 3 5 Upper 4 9 Upper 5 8 Upper 6 10 Upper 7 9 Upper 8 4 Mid 9 3 Mid 10 7 Mid 11 9 Mid 12 11 Mid 13 11 Lower 14 12 Lower 15 9 Lower 16 10 Lower 17 11 Lower
These are the same data as before but in a more “sensible” layout. However, when you try a boxplot() you get the boxes plotted in alphabetical order.
Order a factor using a function
You can use the reorder() command to reorder a predictor variable by a function applied to the response variable. In other words, you can determine the order of the boxes using a median or other function. Use the following general process:
Use reorder(predictor, response, FUN) to determine an order for the predictor variable.
Use the result of reorder() in place of the original predictor variable in the boxplot() command.
bpm <- with(hog2, reorder(site, count, FUN = median)) boxplot(count ~ bpm, data = hog2)
Here the with() command is used to “see inside” the hog2 data. You could use:
attach(hog2) bpm <- reorder(site, count, FUN = median) detach(hog2)
The result is ordered ascending. If you want a descending order simply add a minus sign in front of the response variable:
bpm <- with(hog2, reorder(site, -count, FUN = median)) boxplot(count ~ bpm, data = hog2)
The procedure works with multiple predictors but you can only reorder() one at a time.
You can use the reorder() command to reorder a predictor variable by a function applied to the response variable. In other words, you can determine the order of the boxes using a median or other function. Use the following general process:
Use reorder(predictor, response, FUN) to determine an order for the predictor variable.
Use the result of reorder() in place of the original predictor variable in the boxplot() command.
bpm <- with(hog2, reorder(site, count, FUN = median)) boxplot(count ~ bpm, data = hog2)
Here the with() command is used to “see inside” the hog2 data. You could use:
attach(hog2) bpm <- reorder(site, count, FUN = median) detach(hog2)
The result is ordered ascending. If you want a descending order simply add a minus sign in front of the response variable:
bpm <- with(hog2, reorder(site, -count, FUN = median)) boxplot(count ~ bpm, data = hog2)
The procedure works with multiple predictors but you can only reorder() one at a time.
Make a factor in an explicit order
You can make a factor variable into an explicit order using the ordered() command. You just give the name of the factor you want to order and then the names of the levels in the order you want.
The result of the ordered() command is an ordered factor. The upshot is that the order you set will take precedent over the default alphabetical order.
o <- ordered(hog2$site, levels = c("Upper", "Lower", "Mid")) o [1] Upper Upper Upper Upper Upper Upper Upper Mid Mid Mid Mid Mid [13] Lower Lower Lower Lower Lower Levels: Upper < Lower < Mid boxplot(count ~ o, data = hog2)
Comments are closed.