Statistics for Ecologists – Example Data

Home » R Programming, Excel & Data Management Books » Statistics Book for Ecologists » Statistics for Ecologists – Support Index and Overview » Statistics for Ecologists – Example Data

There are a number of data files associated with the book. I’ve tried to ensure that all the data mentioned and illustrated in the text are available for you to download. Many of the datasets are used in Have a Go exercises, you can get the data and then follow along with the exercises. On this page you can find some details about each dataset and of course download the files.

Archive files – there are two files, one for Spreadsheet files and one for R-format data.
Data resources listed by topic – this gives you an idea of what sorts of things you can do with each data resource.
List of data resources – this section lists all the data files (more or less in alphabetical order) and provides some information about each. There are some ideas about what you might “do” with each dataset.

See also the Exercises support page, where there are additional notes and exercises. You’ll find more datasets with these exercises and links to the files as you need.

See our sister site, DataAnalytics: Ecology Matters, for resources for Ecology Students & Teachers. Including: data examples to use for practise and demonstration, and Custom Functions for R: The Statistical Programming Language.

Note: if you have the original edition of Statistics for Ecologists the example data are a subset of the new edition. However, see here.

Buy this book online

Buy Now

Data resources listed by topic

Click a name to go to a description of the data. The data are contained in two archives, one for the R data, S4E2e.RData and one for spreadsheet data, S4E2e Archive Excel.zip. Note that some items may be listed in more than one section. Graphics are not mentioned as a topic because all the data can be shown graphically one way or another! Similarly, many of the data can be used for practice at manipulating data in Excel and/or R in some way (see Miscellaneous).

Summary Statistics | Data Distribution | Differences (2 samples) | Correlations (2 variables)

Archives – Instructions (& download)

Correlations (2 variables)

Butterfly Food – there are three variables but you can practice using two at a time
Bluebell and Light – polynomial regression
Freshwater (correlation)
Growth (plant growth) – logarithmic correlation/regression
Mayfly (correlation)
Mayfly (regression)
Pearson correlation data

Associations

Birds and Habitat
Heather species
Invertebrates and Habitat
Pea genetics – Goodness of Fit test

Regression

Beach hoppers – logistic regression
Butterfly Food – multiple linear regression
Bluebell and Light – polynomial regression (curvilinear regression)
Growth (plant growth) – logarithmic regression (curvilinear regression)
Mayfly (regression) – multiple linear regression
Newt presence-absence – multiple logistic regression

Diversity

Ants and Fire
Butterflies and year
Diversity Simpson D.xls – calculates the Simpson’s D index
Diversity Shannon.xls – calculates the Shannon index
Freshwater invertebrates.xlsx – a simple dataset with taxonomic information and quantity
Hornbill diet
Mosses and trees
Plant species lists
Plant species abundance

Similarity

Ants and Fire
Butterflies and year
Hornbill diet
Mosses and trees
Plant species lists
Plant species abundance

Miscellaneous

Butterflies and Habitat – Pivot Tables and rearranging/managing data
Birds and Habitat – Pivot Tables and rearranging/managing data
Dominance/Abundance scales – Using lookup tables (=VLOOKUP in Excel)
Plant species lists – Pivot Tables and rearranging/managing data
Seashore seaweed – Using lookup tables (=VLOOKUP in Excel)

List of data resources

Arranged more or less alphabetically

Ants and fire

These data are adapted from Hoffmann, B.D. 2003. Austral Ecol. 28, p.182 and show the abundance of 91 species of ant in 10 samples. The samples are from two types of soil (red and black) and from 5 fire regimes. The data are arranged with the samples as columns, the column names indicate the soil and regime as follows:

r = red soil
b = black soil
E2 = burnt every 3yr with grazing early (May)
E3 = burnt, spelled & burnt in 2 successive yr
L2 = burnt every 3yr with late grazing (Oct)
L3 = burnt, spelled , burnt in 2 successive yr
U = unburnt control

The data are used for dissimilarity calculations (including visualisation of dissimilarity with a dendrogram) but you can also use them to explore diversity. The data are in the S4E2e.RData archive and are named ant.

Beetle sizes

These data give the sizes (in mm) of a species of water beetle. The main sample can be used for data summary; a second sample is available (in R) to use for comparisons.

The beetles.xls file in the S4E2e Archive Excel.zip archive contains the main sample as well as examples of histograms.
The RData file contains the main sample as an object called bd, there is a second sample Mar (there are also copies sunny, shady).

The data can be used for data summary, such as mean, median, standard deviation and so on. You can also practice drawing histograms. The two samples can also be compared with the t-test or U-test.

Beach hoppers

These data show the allele frequencies at the mannose-6-phosphate isomerase (Mpi) locus in the amphipod crustacean Megalorchestia californiana, Californian beach hopper. Data from McDonald, J.H. 1985 (Heredity 54: 359–366).

Beach hopper allele.csv – this file is in the S4E2e Archive Excel.zip archive and will open in a spreadsheet. You can use this for practice at importing data into R but see below…
The RData file contains the data as an object called cbh.

These data are used to demonstrate logistic regression. Each row of the data gives the latitude and the number of specimens that had each form of the allele. There are two forms, so the data are binary, which is why a logistic regression is the appropriate method of analysis. Logistic regression is a form of generalized linear modelling (GLM).

Butterflies and Year

These data show the abundance (as a count) of six butterfly species over five years at a site in Scotland. The data are arranged with the columns giving the year of the sample, each row gives the abundance of a species.

Butterfly table – the data are in the S4E2e Archive Excel.zip archive as an XLSX file and a CSV. You can read the CSV file into R, in which case you need to add names = FALSE to the read.csv() command.
The RData file contains the data as two objects: bf is a matrix and butterfly is a data.frame.

You can use the data for graphical summary, showing line plots of abundance and time for example, as well as bar charts and pie charts. You can also look at the diversity of the samples as a bit of practice with diversity indices, such as Shannon and Simpson’s. You could also explore similarity between years.

Although perhaps not the most sensible types of analysis you might also use the data for comparison of differences or changes with time (as a correlation).

Butterflies & Habitat

These data show the abundance of butterflies in three habitats. Each habitat was sampled several times. The datafile has three columns, for the abundance, habitat and an index variable (the replicate).

xlsx – the data are in the S4E2e Archive Excel.zip archive as an Excel format file.

These data are used as an example of rearranging and managing data using a Pivot Table in Excel. You can also use the data to look at data summary, graphics and differences (there are 3 samples). To do that using R you’ll need to save a copy as a CSV file then import into R.

Butterfly food

These data show the abundance of butterflies and the availability of food plants and nectar resources.

csv – the data are in the S4E2e Archive Excel.zip archive and will open in Excel.
The RData file contains the data as an object (data.frame) called bff.

These data can be used to look at (multiple) regression (or correlation), and associated statistics (such as beta coefficients). You can also use the data for some graphical summary, such as scatter plots.

Birds & Habitat

These data show the abundance of some common UK bird species in various habitats.

xlsx – the data are in the S4E2e Archive Excel.zip archive as an Excel format file. These data are in recording format and you can use these as practice at using a Pivot Table.
csv – these data are also in the archive and contain the data in the form of a contingency table.
xlsx – this datafile is in Excel format and shows the data in a contingency table. There is also a completed association analysis in another worksheet.
The RData file contains the data (contingency table layout) as two objects: birds is a matrix and bird is a data.frame.

The main purpose of these data is to look at tests of association (the Chi squared test). You can also use them for graphical summary, using bar charts and pie charts. The birds.xlsx file can also be used for translating data from recording layout into a contingency table (in Excel using a Pivot table, in R you can use the xtabs() command).

Bluebell abundance

These data show the abundance of bluebell in a wood in England. Data are presented showing the abundance of the plant and the light intensity at the growing site.

Bluebell polynomial.xlsx – the data are in the S4E2e Archive Excel.zip archive and will open in Excel.
The RData file contains the data as an object called bbel.

These data show an interesting relationship between abundance and light, an inverted U shape. This lends itself to a regression using a polynomial equation. You can also use the data for graphical summary, such as a scatter plot and line of best fit (a trendline).

Dominance/Abundance scales

This dataset is used in an exercise on using lookup tables. It is useful to be able to convert from an ordinal scale that uses text values to an ordinal scale with “real” numbers.

xlsx – the data are in the S4E2e Archive Excel.zip archive as an Excel format file.

Use this to help practice using the =LOOKUP function in Excel. This allows you to replace one value with another. In this case an abundance as a text label (D = dominant, A = abundant etc.) can be replaced with a numerical value. This allows you to carry out non-parametric statistics (e.g. the U-test). You are essentially replacing a text-based ordinal scale with a number-based ordinal scale.

Diversity Calculation

These two files show you how to calculate two indices of diversity; both are in the S4E2e Archive Excel.zip archive.

Diversity Simpson D.xls – as the name suggests, this calculates the Simpson’s D index of diversity.
Diversity Shannon.xls – this spreadsheet computes the Shannon index (also called Shannon-Wiener or Shannon-Weaver).

These files can be used as the basis for a spreadsheet calculator (you can add extra rows as you need), which you can use to help compute the two commonly used indices of diversity.

Flour beetles

These data show the abundance of flour beetles in samples taken from two different (fictitious) farms. The Excel version shows the data in sample layout, with one column for each sample. There are several R versions, with the data in different layouts:

flour beetles.xls – the data are in theS4E2e Archive Excel.zip One column shows the counts of beetles from Woad Farm, the other column shows the counts from Glebe Farm.
The RData file contains the data as four objects:
- flour1 – a data.frame with a column qty and a column site, i.e. in recording layout
- flour2 – a data.frame with two columns, Woad.Fm and Glebe.Fm, each containing the counts from a separate farm
- Woad.Fm – a vector of values representing the counts of beetles at Woad farm
- Glebe.Fm – a vector of values representing counts of beetles at Glebe farm

These data can be used for exploring differences between samples. You can also use them for graphical summary, e.g. bar charts, box-whisker plots, and for data summary (e.g. mean, median, standard error). The R-format data are in several forms so that you can practice carrying out commands on the different type of object.

Freshwater invertebrates (correlation)

These data show the abundance of a freshwater invertebrate and the water speed at the point of collection. See also the Mayfly (correlation) data, which are very similar.

freshwater correlation.xlsx – the data are in the S4E2e Archive Excel.zip There are two columns, Abund and Speed for Abundance and water Speed respectively.
The RData file contains the data as a data.frame called fw.

These data can be used for exploration of correlation as well as some graphical summary (e.g. scatter plot).

Freshwater invertebrates (diversity)

These data give the abundance of some freshwater invertebrates from Goredale Beck in Yorkshire. There is also some taxonomic information for each invertebrate recorded.

Freshwater invertebrates.xlsx – the data are in the S4E2e Archive Excel.zip There is a column for the count of each taxa. Other columns give the taxonomic information (e.g. phylum, order).

You can use these data for looking at diversity. You can also practice transferring the data from Excel into R.

Growth (plant growth)

These data show the growth of a plant species in response to different levels of a nutrient.

Growth Logarithmic.xlsx – the data are in the S4E2e Archive Excel.zip There are two columns Growth and Nutrient.
The RData file contains the data as a data.frame called pg.

These data show an interesting relationship. If you plot one variable against the other you’ll see that the points “curve”. In fact the relationship is a logarithmic one. You can use these data to look at curvilinear regression, in this case logarithmic regression. This is ordinary linear regression but with a logarithmic equation.

You can also use the data to look at graphical summary, e.g. a scatter plot with line of best fit (trendline).

Heather species

This dataset shows the abundance of two species of heather in Cornwall. The data are in the form of a contingency table. In total 137 quadrats were used and the presence of each species noted. The contingency table shows the frequency of occurrence; thus you have four options:

Both species present together in a quadrat
Calluna vulgaris present only
Erica cinerea present only
Neither species present (i.e. both absent)

The data can be used to explore the association between the two species.

csv – the data are in the S4E2e Archive Excel.zip archive and will open in a spreadsheet.
The RData file contains the data as a data.frame called heather.

You can use the data for tests of association (e.g. the Chi squared test), and since this is a 2×2 contingency you can apply Yates’ correction. You can also use the data for graphical summary (e.g. bar chart, pie chart). You can also use the CSV file to practice transferring data from spreadsheet to R.

Hoglouse abundance

These data show the abundance of hoglouse (Asellus spp), a freshwater invertebrate, at three sampling locations.

xlsx – the data are in the S4E2e Archive Excel.zip archive. These data are in sample format; there is a column of abundance data for each of the three sampling locations. The spreadsheet also contains a second worksheet giving the summary statistics and a bar chart with error bars.
The RData file contains the data as two data.frame objects:
- hog2 – gives the data in recording layout, there is a column for count and a column for site.
- hog3 – gives the data in sample layout, there is a column for each sample.

You can use the data for exploring differences between samples. In the book text you use these data for a Kruskal-Wallis test, which is a non-parametric test of differences between more than two samples. You can also use the data for practice at data summary and graphics (these data are used to draw bar charts in the book text). You could also subset the data and look at comparing just two samples.

Hornbill diet

These data show the presence of different fruits in the diet of three species of hornbill from India (data adapted from Datta, A. & Rawat, G.S. 2003. Biotropica 35, p.208). The data are in the form of presence-absence, so if a fruit species was found in the diet a 1 is recorded, if the fruit was absent it is shown as 0.

csv – the data are in the S4E2e Archive Excel.zip archive and will open in a spreadsheet. There is a column for the species names of the fruit (as an abbreviated scientific name). Each of the next columns shows the presence-absence of these fruits in the diet of three species of hornbill:
- GH = great hornbill
- WH = wreathed hornbill
- OPH = oriental pied hornbill
The RData file contains the data as a data.frame, hornbill. The rownames have been set as the fruit species and the main data are the three columns of fruit presence-absence for the hornbill species.

You can use these data to look at similarity (dissimilarity) and to draw a simple dendrogram to show the relationship between the samples in terms of the presence of fruit species. You can also use the data for diversity (species richness).

Invertebrates and Habitat

These data show the frequency of observation of some terrestrial invertebrate taxa on different parts of plants. There are two datasets, which are similar. Both give the frequency of observation in the form of a contingency table.

invert habitat.csv – the data are in the S4E2e Archive Excel.zip archive and will open in a spreadsheet. The data are in the form of a contingency table. The first column gives five invertebrate taxa and there are three three sites (Upper, Lower, Stem).
csv – the data are in the S4E2e Archive Excel.zip archive and will open in a spreadsheet. The data are in the form of a contingency table. The first column gives the names of four sites (Upper leaf, Lower leaf, Stem, Bud) and the subsequent columns show frequencies for three invertebrate taxa.
The RData file contains the data from “invert habitat.csv” as a data.frame called inv.hab.

You can use these data for tests of association (e.g. Chi squared) and for data summary (e.g. bar charts, pie charts). Note that the CSV dataset invert.csv is not duplicated in R, which means you can practice importing CSV data into R.

Leaf sizes

These data show the sizes of tree leaves in millimetres. The main (Excel) dataset gives 10 samples, each of 10 measurements.

xlsx – the data are in the S4E2e Archive Excel.zip archive. There are 100 measurements, separated into 10 samples of 10 readings.
The RData file contains a vector object called lf, which gives just one of the samples from the XLSX file (the 2nd).

You can use these data to look at data summary and especially the calculation of running means and standard error. You can use them for graphical summary too, such as line plots and the running mean. If you want to use the entire dataset in R you’ll need to transfer the data from the Excel file, which will give you some practice.

You could also explore differences between samples (pairs or several/all at once).

Mayfly (correlation)

These data show the abundance of a mayfly species and the speed of the water at the sampling location.

csv – the data are in the S4E2e Archive Excel.zip archive. There is a column for Speed and one for the corresponding Abund.
The RData file contains a data.frame object called mayfly, which contains two columns.

Use these data for looking at correlation between the two variables. You can also look at graphical summary (scatter plot), as well as the data summary (averages, distribution).

Mayfly (regression)

These data give the sizes of a freshwater invertebrate and several environmental variables at the sampling location for each size measurement.

The data contains the following variables:

Length = the length of the invertebrate in mm
Speed = the water speed (time taken for a hydroprop to complete)
Algae = the percentage cover of algae on the substrate
NO3 = the concentration of nitrates
BOD = the biological oxygen demand

The dataset is in two forms:

mayfly regression.csv – the data are in the S4E2e Archive Excel.zip The file contains five data columns plus an extra “index” column giving a simple number.
The RData file contains a data.frame called mf, which has five columns corresponding to the items listed earlier.

You can use these data for regression analysis (multiple regression), and for graphical summary (e.g. scatter plots, perhaps with a trendline). You can also look at the data summary and could conduct simple correlation between pairs of variables. The dataset provides a simple introduction to regression model building.

Mosses and trees

These data show the abundance of some bryophyte species on trees in North Carolina (data adapted from Palmer M.W. 1986. The Bryologist 89, p.59). The data are in a community layout, with the columns giving the sample names (the trees) and the rows being the bryophyte species.

Moss data.xls – the data are in the S4E2e Archive Excel.zip The main worksheet gives the species abundance information and the second worksheet shows a completed dissimilarity matrix (Euclidean metric).
The RData file contains a data.frame called mosess, which has the data with rows as sites and columns as species (which is transposed from the layout in the Excel file).

You can use these data to calculate indices of similarity (dissimilarity) and indeed the Excel file contains a second worksheet with a completed Euclidean dissimilarity matrix. You can also use these data for visualizing dissimilarity (e.g. with a dendrogram).

You could also use the data to look at diversity indices and for graphical summary (e.g. bar charts, pie charts).

Newt presence-absence

These data give the presence-absence of great crested newts at ponds in Buckinghamshire, UK. The presence of a newt is recorded with a 1 and the absence by a 0; hence the data are binary. The other columns give various habitat factors such as the area of the pond, an index based on presence of fish, and other factors.

Newt HSI.csv – the data are in the S4E2e Archive Excel.zip There are several columns in the file:
- presence = the presence or absence of newts (1 or 0).
- area = the area of the pond in square metres.
- dry = an index of how often the pond dries (1 = never, 2 = rarely, 3 = occasional, 4 = annually).
- water = an index of water quality (1 = bad, 2 = poor, 3 = moderate, 4 = good).
- shade = a value for the % shade (from trees and so on).
- bird = an index for the presence of waterfowl (1 = absent, 2 = minor, 3 = major).
- fish = an index for the presence of fish (1 = major, 2 = minor, 3 = possible, 4 = absent).
- other.ponds = the number of other ponds within 1 km.
- land = an index of land use quality (for newts: 1 = bad, 2= poor, 3 = moderate, 4 = good).
- macro = the % cover of macrophytes.
- HSI = the overall Habitat Suitability Index (a measure of “how suitable” a pond might be as a habitat for newts).
The RData file contains a data.frame called gcn, which contains the data (there are 200 rows).

You can use the data for logistic regression, which is a form of generalised linear modelling (GLM). You can carry out logistic regression on single factors or build a regression model with several terms.

Pearson correlation data

These data show the abundance of a freshwater invertebrate with corresponding flow rate.

xlsx – the data are in the S4E2e Archive Excel.zip archive. There are two columns; abund and flow.
The RData file contains a data.frame called pearson, which contains the data.

Use these data to carry out correlation between the two variables. You can also use the data to look at data summary (including distribution) and for graphical summary (e.g. scatter plot).

Pea genetics

These data show the frequency of pea plants exhibiting combinations of coat colour and type.

csv – the data are in the S4E2e Archive Excel.zip archive. These data are arranged with columns like so:
- Colour = the colour of the pea (green or yellow)
- Coat = the type of coat (wrinkled or smooth)
- Obs = the number of peas for each combination of colour and coat
- Ratio = the expected ratio of observations based on genetic theory
The RData file contains a simple vector called peas. This vector gives the frequency of observation for the various combinations of coat and colour.

You can use these data for tests of goodness of fit (a kind of association test), using Chi squared. You can also summarise the data graphically (e.g. bar charts).

Note that the R-format data only contains the observed frequencies. If you want to conduct a goodness of fit test in R you will have to incorporate the “expected” ratio data in some manner.

Plant species lists

These data give vascular plant species names for samples from 10 sites from a survey in Shropshire, UK.

Plant species lists.csv – the data are in the S4E2e Archive Excel.zip The first column gives the site name as a simple abbreviation (there are 10). The second column give the scientific name of the species. There are 187 observations in total.
The RData file contains a data.frame called plrich. This gives the Site and Species names in two columns.

The data provide some practice at cross tabulation, using the Pivot Table in Excel or table() in R. You can re-arrange the data to give a presence/absence table, where 1 = presence of a species at a site and 0 is absence (in fact you will need to do that in order to determine species richness). The S4E2e.RData file contains an object called ps, which has already been tabulated.

Once the data are tabulated you can use them to explore species richness (a measure of diversity). You can also look at similarity (which you can also plot using a dendrogram).

Plant species & watering

These data show the growth (in cm) of two plant species in response to three different watering regimes (low, hi and mid).

Two way online.xlsx – the data are in the S4E2e Archive Excel.zip The data are in a particular layout that allows Excel to calculate ANOVA. The data are arranged in an “on the ground” layout with samples in separate blocks. There is a separate worksheet containing a completed 2-way ANOVA (see the Exercises support page).
The RData file contains a data.frame called pw. This gives the data in recording layout; the response variable is height, and the two predictor variables are plant and water.

Use these data for two-way analysis of variance (2-way ANOVA). The analysis is straightforward in R but less so in Excel (see the Exercises support page for an online exercise on computing 2-way ANOVA using Excel). You can also use the data for graphical representation of results (e.g. boxplot or bar chart) as well as general data summary.

Plant species abundance

These data give the abundance of some terrestrial vascular plants at 10 sites in Shropshire, UK. The data are the same as for the Plant species lists dataset but include the abundance information.

Plant species abundance.csv – the data are in the S4E2e Archive Excel.zip These data are in community layout with the first column giving the species name (scientific binomial) and subsequent columns for each of the 10 sample sites. The data are based on average domin scores (an abundance scale similar to Braun Blanquet) from five quadrats.
The RData file contains a data.frame called psa. This gives the data with rows as species and columns as sites.

You can use these data to explore diversity using Simpson’s or the Shannon index. You can also use these data to look at similarity, which you can also plot using a dendrogram.

Ridge & Furrow meadow

These data show the abundance of meadow buttercup plants in one metre square quadrats in an ancient ridge & furrow meadow in Buckinghamshire, UK.

ridge furrow.xlsx – the data are in the S4E2e Archive Excel.zip The file gives the data in sample layout, with a column for ridge and one for furrow. The spreadsheet also contains a completed t-test in a separate worksheet.
The RData file contains the data in four separate objects:
- rf1 – gives that data in recording layout, with a column for count and one for area (Ridge or Furrow).
- rf2 – gives the data in sample layout, with a column for Ridge and one for Furrow.
- furrow – a separate vector object for the Furrow data sample.
- ridge – a separate vector object for the Ridge data sample.

You can use these data to explore differences between two samples (t-test or U-test). You can also display the results graphically (e.g. bar chart or box-whisker plot). You can use the data for summary statistics (mean, median, IQR etc.), and to look at data distribution.

The data are in different forms in R-format so that you can explore how to use different syntax according to the layout of data you have.

Seashore seaweed

These data show the abundance of some seaweed species on a rocky shore in South Devon, UK. The data are presented using a text-based abundance scale (ACFOR) and give the abundance of five species at 13 transect stations across the shore (the stations are different heights above mean tide height).

xlsx – the data are in the S4E2e Archive Excel.zip archive. The file contains two worksheets, with one giving an extra table of data where the ACFOR scale is converted to a numerical ordinal scale.

The data are intended to be used as an example of how to use the =VLOOKUP function in Excel. You use this to replace one item with another. In this case the text-based ordinal scale is replaced by a numerical scale.

Sward height

These data give the height of vegetation (cm) in the sward at three sampling locations in a meadow in Shropshire, UK.

Sward height.xlsx – the data are in the S4E2e Archive Excel.zip There are three columns, one for each of the samples; Upper, Middle, Lower.
The RData file contains the data in two data.frame objects:
- sward2 – the data are in recording layout with a column for Height (the response variable) and a column for Site (the predictor variable).
- sward3 – the data are in sample layout with a column for each of the three sites.

You can use these data for exploring differences between more than two samples, e.g. the Kruskal-Wallis test (non-parametric) or analysis of variance (ANOVA, parametric). You can also use the data to look at data summaries (e.g. median, mean) and graphically (e.g. boxplot of results or histogram of sample distribution).

Tree sizes and month

These data show the size of Sitka spruce trees measured at monthly intervals. The size is determined as the height multiplied by the diameter squared, on a log scale (i.e. log(h x d²)). The data are modified from an original (larger) dataset from within R.

The RData file contains the data in a data.frame object called tree. There are two columns, month and size.

Use these data to draw a line plot of spruce growth.

Whitefly

These data show the counts of whitefly (a greenhouse pest) attracted to different coloured sticky traps. Each trap is bi-coloured so the data are in the form of matched pairs.

xlsx – the data are in the S4E2e Archive Excel.zip archive. There are two columns, one for the count of whitefly on the White side and the corresponding count for the Yellow side.
The RData file contains the data in a data.frame object called whitefly. There are two columns, white and yellow.

You can use these data to carry out a matched pairs test (either a t-test or U-test, the latter also called Wilcoxon matched pairs). You can also summarise the results graphically.

Wilcoxon online exercise

These data are in the form of matched pairs (they are fictitious data).

xlsx – the data are in the S4E2e Archive Excel.zip archive. There are two columns, A and B.

You can use the data for Wilcoxon matched pairs analysis, as well as graphical summary. See the Exercises support page where there is an online exercise in using Excel to carry out a matched pairs test.

Original Edition – Statistics for Ecologists

Original edition cover If you have the original edition of the book then all the data examples from that are incorporated into the new edition. This means you can use the data file from the new edition and have access to all the original data.

The S4E2e Archive Excel.zip file contains the same data as the original edition but in CSV form rather than TXT files.

If you want the original data then click this archive: S4E-Edition-1-Data.

My Publications

I have written several books on ecology and data analysis

An Introduction to R

Data Analysis and Visualisation

£35.00

Buy Now

Beginning R: The Statistical

Programming Language

£26.99

Buy now

Statistics for Ecologists

Using R and Excel

£34.99

Buy now

The Essential R

Reference

£44.99

Buy now

Community

Ecology

£39.99

Buy now

Managing Data

Using Excel

£24.99

Buy now

Register your interest for our Training Courses

We run training courses in data management, visualisation and analysis using Excel and R: The Statistical Programming Environment. Courses will be held at one of our training centres in London. Alternatively we can come to you and provide the training at your workplace. Training Courses are also available via an online platform.

Get In Touch Now

for any information regarding our training courses, publications or help with a data project