There are a number of data files associated with the book. I’ve tried to ensure that all the data mentioned and illustrated in the text are available for you to download. Many of the datasets are used in Have a Go exercises, you can get the data and then follow along with the exercises. On this page you can find some details about each dataset and of course download the files.
- Archive files – there are two files, one for Spreadsheet files and one for R-format data.
- Data resources listed by topic – this gives you an idea of what sorts of things you can do with each data resource.
- List of data resources – this section lists all the data files (more or less in alphabetical order) and provides some information about each. There are some ideas about what you might “do” with each dataset.
See also the Exercises support page, where there are additional notes and exercises. You’ll find more datasets with these exercises and links to the files as you need.
See our sister site, DataAnalytics: Ecology Matters, for resources for Ecology Students & Teachers. Including: data examples to use for practise and demonstration, and Custom Functions for R: The Statistical Programming Language.
Note: if you have the original edition of Statistics for Ecologists the example data are a subset of the new edition. However, see here.
Data resources listed by topic
Click a name to go to a description of the data. The data are contained in two archives, one for the R data, S4E2e.RData and one for spreadsheet data, S4E2e Archive Excel.zip. Note that some items may be listed in more than one section. Graphics are not mentioned as a topic because all the data can be shown graphically one way or another! Similarly, many of the data can be used for practice at manipulating data in Excel and/or R in some way (see Miscellaneous).
Summary Statistics | Data Distribution | Differences (2 samples) | Correlations (2 variables)
Associations | Differences (>2 samples) | Regression | Diversity | Similarity | Miscellaneous
Archives – Instructions (& download)
Summary Statistics
Data Distribution
Differences (2 samples)
- Beetle sizes
- Flour beetles
- Hoglouse abundance
- Leaf sizes
- Ridge & Furrow
- Sward height
- Whitefly – matched pairs
- Wilcoxon online exercise – matched pairs
Correlations (2 variables)
- Butterfly Food – there are three variables but you can practice using two at a time
- Bluebell and Light – polynomial regression
- Freshwater (correlation)
- Growth (plant growth) – logarithmic correlation/regression
- Mayfly (correlation)
- Mayfly (regression)
- Pearson correlation data
Associations
- Birds and Habitat
- Heather species
- Invertebrates and Habitat
- Pea genetics – Goodness of Fit test
Differences for more than two samples
Regression
- Beach hoppers – logistic regression
- Butterfly Food – multiple linear regression
- Bluebell and Light – polynomial regression (curvilinear regression)
- Growth (plant growth) – logarithmic regression (curvilinear regression)
- Mayfly (regression) – multiple linear regression
- Newt presence-absence – multiple logistic regression
Diversity
- Ants and Fire
- Butterflies and year
- Diversity Simpson D.xls – calculates the Simpson’s D index
- Diversity Shannon.xls – calculates the Shannon index
- Freshwater invertebrates.xlsx – a simple dataset with taxonomic information and quantity
- Hornbill diet
- Mosses and trees
- Plant species lists
- Plant species abundance
Similarity
Miscellaneous
- Butterflies and Habitat – Pivot Tables and rearranging/managing data
- Birds and Habitat – Pivot Tables and rearranging/managing data
- Dominance/Abundance scales – Using lookup tables (
=VLOOKUP
in Excel) - Plant species lists – Pivot Tables and rearranging/managing data
- Seashore seaweed – Using lookup tables (
=VLOOKUP
in Excel)
Archives
- RData – this file contains all the data in R-format.
- S4E2e Archive Excel.zip – this file contains all the Excel format files (including CSV).
You can use the RData file in several ways:
- Open R then use
load(file.choose())
and select theRData
file (in Linux you should use the filename (in quotes) explicitly, including the path). - Double-click the file. If R is already open it will add the data to your workspace, if R is not open it will open and the workspace will contain only these data (and the working directory will be set to wherever the
RData
file was stored. - Drag the file to the R icon. The behaviour is the same as above.
List of data resources
Arranged more or less alphabetically
Ants and fire
These data are adapted from Hoffmann, B.D. 2003. Austral Ecol. 28, p.182 and show the abundance of 91 species of ant in 10 samples. The samples are from two types of soil (red and black) and from 5 fire regimes. The data are arranged with the samples as columns, the column names indicate the soil and regime as follows:
r
= red soilb
= black soilE2
= burnt every 3yr with grazing early (May)E3
= burnt, spelled & burnt in 2 successive yrL2
= burnt every 3yr with late grazing (Oct)L3
= burnt, spelled , burnt in 2 successive yrU
= unburnt control
The data are used for dissimilarity calculations (including visualisation of dissimilarity with a dendrogram) but you can also use them to explore diversity. The data are in the S4E2e.RData archive and are named ant
.
Beetle sizes
These data give the sizes (in mm) of a species of water beetle. The main sample can be used for data summary; a second sample is available (in R) to use for comparisons.
- The beetles.xls file in the S4E2e Archive Excel.zip archive contains the main sample as well as examples of histograms.
- The RData file contains the main sample as an object called
bd
, there is a second sampleMar
(there are also copiessunny
,shady
).
The data can be used for data summary, such as mean, median, standard deviation and so on. You can also practice drawing histograms. The two samples can also be compared with the t-test or U-test.
Beach hoppers
These data show the allele frequencies at the mannose-6-phosphate isomerase (Mpi) locus in the amphipod crustacean Megalorchestia californiana, Californian beach hopper. Data from McDonald, J.H. 1985 (Heredity 54: 359–366).
- Beach hopper allele.csv – this file is in the S4E2e Archive Excel.zip archive and will open in a spreadsheet. You can use this for practice at importing data into R but see below…
- The RData file contains the data as an object called
cbh
.
These data are used to demonstrate logistic regression. Each row of the data gives the latitude and the number of specimens that had each form of the allele. There are two forms, so the data are binary, which is why a logistic regression is the appropriate method of analysis. Logistic regression is a form of generalized linear modelling (GLM).
Butterflies and Year
These data show the abundance (as a count) of six butterfly species over five years at a site in Scotland. The data are arranged with the columns giving the year of the sample, each row gives the abundance of a species.
- Butterfly table – the data are in the S4E2e Archive Excel.zip archive as an XLSX file and a CSV. You can read the CSV file into R, in which case you need to add
names = FALSE
to theread.csv()
command. - The RData file contains the data as two objects:
bf
is a matrix andbutterfly
is a data.frame.
You can use the data for graphical summary, showing line plots of abundance and time for example, as well as bar charts and pie charts. You can also look at the diversity of the samples as a bit of practice with diversity indices, such as Shannon and Simpson’s. You could also explore similarity between years.
Although perhaps not the most sensible types of analysis you might also use the data for comparison of differences or changes with time (as a correlation).
Butterflies & Habitat
These data show the abundance of butterflies in three habitats. Each habitat was sampled several times. The datafile has three columns, for the abundance, habitat and an index variable (the replicate).
- xlsx – the data are in the S4E2e Archive Excel.zip archive as an Excel format file.
These data are used as an example of rearranging and managing data using a Pivot Table in Excel. You can also use the data to look at data summary, graphics and differences (there are 3 samples). To do that using R you’ll need to save a copy as a CSV file then import into R.
Butterfly food
These data show the abundance of butterflies and the availability of food plants and nectar resources.
- csv – the data are in the S4E2e Archive Excel.zip archive and will open in Excel.
- The RData file contains the data as an object (
data.frame
) calledbff
.
These data can be used to look at (multiple) regression (or correlation), and associated statistics (such as beta coefficients). You can also use the data for some graphical summary, such as scatter plots.
Birds & Habitat
These data show the abundance of some common UK bird species in various habitats.
- xlsx – the data are in the S4E2e Archive Excel.zip archive as an Excel format file. These data are in recording format and you can use these as practice at using a Pivot Table.
- csv – these data are also in the archive and contain the data in the form of a contingency table.
- xlsx – this datafile is in Excel format and shows the data in a contingency table. There is also a completed association analysis in another worksheet.
- The RData file contains the data (contingency table layout) as two objects:
birds
is amatrix
andbird
is adata.frame
.
The main purpose of these data is to look at tests of association (the Chi squared test). You can also use them for graphical summary, using bar charts and pie charts. The birds.xlsx file can also be used for translating data from recording layout into a contingency table (in Excel using a Pivot table, in R you can use the xtabs()
command).
Bluebell abundance
These data show the abundance of bluebell in a wood in England. Data are presented showing the abundance of the plant and the light intensity at the growing site.
- Bluebell polynomial.xlsx – the data are in the S4E2e Archive Excel.zip archive and will open in Excel.
- The RData file contains the data as an object called
bbel
.
These data show an interesting relationship between abundance and light, an inverted U shape. This lends itself to a regression using a polynomial equation. You can also use the data for graphical summary, such as a scatter plot and line of best fit (a trendline).
Dominance/Abundance scales
This dataset is used in an exercise on using lookup tables. It is useful to be able to convert from an ordinal scale that uses text values to an ordinal scale with “real” numbers.
- xlsx – the data are in the S4E2e Archive Excel.zip archive as an Excel format file.
Use this to help practice using the =LOOKUP
function in Excel. This allows you to replace one value with another. In this case an abundance as a text label (D
= dominant, A
= abundant etc.) can be replaced with a numerical value. This allows you to carry out non-parametric statistics (e.g. the U-test). You are essentially replacing a text-based ordinal scale with a number-based ordinal scale.
Diversity Calculation
These two files show you how to calculate two indices of diversity; both are in the S4E2e Archive Excel.zip archive.
- Diversity Simpson D.xls – as the name suggests, this calculates the Simpson’s D index of diversity.
- Diversity Shannon.xls – this spreadsheet computes the Shannon index (also called Shannon-Wiener or Shannon-Weaver).
These files can be used as the basis for a spreadsheet calculator (you can add extra rows as you need), which you can use to help compute the two commonly used indices of diversity.
Flour beetles
These data show the abundance of flour beetles in samples taken from two different (fictitious) farms. The Excel version shows the data in sample layout, with one column for each sample. There are several R versions, with the data in different layouts:
- flour beetles.xls – the data are in theS4E2e Archive Excel.zip One column shows the counts of beetles from Woad Farm, the other column shows the counts from Glebe Farm.
- The RData file contains the data as four objects:
flour1
– adata.frame
with a columnqty
and a columnsite
, i.e. in recording layoutflour2
– adata.frame
with two columns,Woad.Fm
andGlebe.Fm
, each containing the counts from a separate farmWoad.Fm
– a vector of values representing the counts of beetles at Woad farmGlebe.Fm
– a vector of values representing counts of beetles at Glebe farm
These data can be used for exploring differences between samples. You can also use them for graphical summary, e.g. bar charts, box-whisker plots, and for data summary (e.g. mean, median, standard error). The R-format data are in several forms so that you can practice carrying out commands on the different type of object.
Freshwater invertebrates (correlation)
These data show the abundance of a freshwater invertebrate and the water speed at the point of collection. See also the Mayfly (correlation) data, which are very similar.
- freshwater correlation.xlsx – the data are in the S4E2e Archive Excel.zip There are two columns,
Abund
andSpeed
for Abundance and water Speed respectively. - The RData file contains the data as a data.frame called
fw
.
These data can be used for exploration of correlation as well as some graphical summary (e.g. scatter plot).
Freshwater invertebrates (diversity)
These data give the abundance of some freshwater invertebrates from Goredale Beck in Yorkshire. There is also some taxonomic information for each invertebrate recorded.
- Freshwater invertebrates.xlsx – the data are in the S4E2e Archive Excel.zip There is a column for the count of each taxa. Other columns give the taxonomic information (e.g. phylum, order).
You can use these data for looking at diversity. You can also practice transferring the data from Excel into R.
Growth (plant growth)
These data show the growth of a plant species in response to different levels of a nutrient.
- Growth Logarithmic.xlsx – the data are in the S4E2e Archive Excel.zip There are two columns
Growth
andNutrient
. - The RData file contains the data as a data.frame called
pg
.
These data show an interesting relationship. If you plot one variable against the other you’ll see that the points “curve”. In fact the relationship is a logarithmic one. You can use these data to look at curvilinear regression, in this case logarithmic regression. This is ordinary linear regression but with a logarithmic equation.
You can also use the data to look at graphical summary, e.g. a scatter plot with line of best fit (trendline).
Heather species
This dataset shows the abundance of two species of heather in Cornwall. The data are in the form of a contingency table. In total 137 quadrats were used and the presence of each species noted. The contingency table shows the frequency of occurrence; thus you have four options:
- Both species present together in a quadrat
- Calluna vulgaris present only
- Erica cinerea present only
- Neither species present (i.e. both absent)
The data can be used to explore the association between the two species.
- csv – the data are in the S4E2e Archive Excel.zip archive and will open in a spreadsheet.
- The RData file contains the data as a data.frame called heather.
You can use the data for tests of association (e.g. the Chi squared test), and since this is a 2×2 contingency you can apply Yates’ correction. You can also use the data for graphical summary (e.g. bar chart, pie chart). You can also use the CSV file to practice transferring data from spreadsheet to R.
Hoglouse abundance
These data show the abundance of hoglouse (Asellus spp), a freshwater invertebrate, at three sampling locations.
- xlsx – the data are in the S4E2e Archive Excel.zip archive. These data are in sample format; there is a column of abundance data for each of the three sampling locations. The spreadsheet also contains a second worksheet giving the summary statistics and a bar chart with error bars.
- The RData file contains the data as two data.frame objects:
hog2
– gives the data in recording layout, there is a column forcount
and a column forsite
.hog3
– gives the data in sample layout, there is a column for each sample.
You can use the data for exploring differences between samples. In the book text you use these data for a Kruskal-Wallis test, which is a non-parametric test of differences between more than two samples. You can also use the data for practice at data summary and graphics (these data are used to draw bar charts in the book text). You could also subset the data and look at comparing just two samples.
Hornbill diet
These data show the presence of different fruits in the diet of three species of hornbill from India (data adapted from Datta, A. & Rawat, G.S. 2003. Biotropica 35, p.208). The data are in the form of presence-absence, so if a fruit species was found in the diet a 1 is recorded, if the fruit was absent it is shown as 0.
- csv – the data are in the S4E2e Archive Excel.zip archive and will open in a spreadsheet. There is a column for the species names of the fruit (as an abbreviated scientific name). Each of the next columns shows the presence-absence of these fruits in the diet of three species of hornbill:
GH
= great hornbillWH
= wreathed hornbillOPH
= oriental pied hornbill
- The RData file contains the data as a
data.frame
,hornbill
. The rownames have been set as the fruit species and the main data are the three columns of fruit presence-absence for the hornbill species.
You can use these data to look at similarity (dissimilarity) and to draw a simple dendrogram to show the relationship between the samples in terms of the presence of fruit species. You can also use the data for diversity (species richness).
Invertebrates and Habitat
These data show the frequency of observation of some terrestrial invertebrate taxa on different parts of plants. There are two datasets, which are similar. Both give the frequency of observation in the form of a contingency table.
- invert habitat.csv – the data are in the S4E2e Archive Excel.zip archive and will open in a spreadsheet. The data are in the form of a contingency table. The first column gives five invertebrate taxa and there are three three sites (Upper, Lower, Stem).
- csv – the data are in the S4E2e Archive Excel.zip archive and will open in a spreadsheet. The data are in the form of a contingency table. The first column gives the names of four sites (
Upper leaf
,Lower leaf
,Stem
,Bud
) and the subsequent columns show frequencies for three invertebrate taxa. - The RData file contains the data from “invert habitat.csv” as a data.frame called
inv.hab
.
You can use these data for tests of association (e.g. Chi squared) and for data summary (e.g. bar charts, pie charts). Note that the CSV dataset invert.csv is not duplicated in R, which means you can practice importing CSV data into R.
Leaf sizes
These data show the sizes of tree leaves in millimetres. The main (Excel) dataset gives 10 samples, each of 10 measurements.
- xlsx – the data are in the S4E2e Archive Excel.zip archive. There are 100 measurements, separated into 10 samples of 10 readings.
- The RData file contains a vector object called lf, which gives just one of the samples from the XLSX file (the 2nd).
You can use these data to look at data summary and especially the calculation of running means and standard error. You can use them for graphical summary too, such as line plots and the running mean. If you want to use the entire dataset in R you’ll need to transfer the data from the Excel file, which will give you some practice.
You could also explore differences between samples (pairs or several/all at once).
Mayfly (correlation)
These data show the abundance of a mayfly species and the speed of the water at the sampling location.
- csv – the data are in the S4E2e Archive Excel.zip archive. There is a column for
Speed
and one for the correspondingAbund
. - The RData file contains a data.frame object called
mayfly
, which contains two columns.
Use these data for looking at correlation between the two variables. You can also look at graphical summary (scatter plot), as well as the data summary (averages, distribution).
Mayfly (regression)
These data give the sizes of a freshwater invertebrate and several environmental variables at the sampling location for each size measurement.
The data contains the following variables:
Length
= the length of the invertebrate in mmSpeed
= the water speed (time taken for a hydroprop to complete)Algae
= the percentage cover of algae on the substrateNO3
= the concentration of nitratesBOD
= the biological oxygen demand
The dataset is in two forms:
- mayfly regression.csv – the data are in the S4E2e Archive Excel.zip The file contains five data columns plus an extra “index” column giving a simple number.
- The RData file contains a data.frame called
mf
, which has five columns corresponding to the items listed earlier.
You can use these data for regression analysis (multiple regression), and for graphical summary (e.g. scatter plots, perhaps with a trendline). You can also look at the data summary and could conduct simple correlation between pairs of variables. The dataset provides a simple introduction to regression model building.
Mosses and trees
These data show the abundance of some bryophyte species on trees in North Carolina (data adapted from Palmer M.W. 1986. The Bryologist 89, p.59). The data are in a community layout, with the columns giving the sample names (the trees) and the rows being the bryophyte species.
- Moss data.xls – the data are in the S4E2e Archive Excel.zip The main worksheet gives the species abundance information and the second worksheet shows a completed dissimilarity matrix (Euclidean metric).
- The RData file contains a data.frame called mosess, which has the data with rows as sites and columns as species (which is transposed from the layout in the Excel file).
You can use these data to calculate indices of similarity (dissimilarity) and indeed the Excel file contains a second worksheet with a completed Euclidean dissimilarity matrix. You can also use these data for visualizing dissimilarity (e.g. with a dendrogram).
You could also use the data to look at diversity indices and for graphical summary (e.g. bar charts, pie charts).
Newt presence-absence
These data give the presence-absence of great crested newts at ponds in Buckinghamshire, UK. The presence of a newt is recorded with a 1
and the absence by a 0
; hence the data are binary. The other columns give various habitat factors such as the area of the pond, an index based on presence of fish, and other factors.
- Newt HSI.csv – the data are in the S4E2e Archive Excel.zip There are several columns in the file:
presence
= the presence or absence of newts (1
or0
).area
= the area of the pond in square metres.dry
= an index of how often the pond dries (1 = never, 2 = rarely, 3 = occasional, 4 = annually).water
= an index of water quality (1 = bad, 2 = poor, 3 = moderate, 4 = good).shade
= a value for the % shade (from trees and so on).bird
= an index for the presence of waterfowl (1 = absent, 2 = minor, 3 = major).fish
= an index for the presence of fish (1 = major, 2 = minor, 3 = possible, 4 = absent).other.ponds
= the number of other ponds within 1 km.land
= an index of land use quality (for newts: 1 = bad, 2= poor, 3 = moderate, 4 = good).macro
= the % cover of macrophytes.HSI
= the overall Habitat Suitability Index (a measure of “how suitable” a pond might be as a habitat for newts).
- The RData file contains a data.frame called gcn, which contains the data (there are 200 rows).
You can use the data for logistic regression, which is a form of generalised linear modelling (GLM). You can carry out logistic regression on single factors or build a regression model with several terms.
Pearson correlation data
These data show the abundance of a freshwater invertebrate with corresponding flow rate.
- xlsx – the data are in the S4E2e Archive Excel.zip archive. There are two columns;
abund
andflow
. - The RData file contains a
data.frame
calledpearson
, which contains the data.
Use these data to carry out correlation between the two variables. You can also use the data to look at data summary (including distribution) and for graphical summary (e.g. scatter plot).
Pea genetics
These data show the frequency of pea plants exhibiting combinations of coat colour and type.
- csv – the data are in the S4E2e Archive Excel.zip archive. These data are arranged with columns like so:
Colour
= the colour of the pea (green
oryellow
)Coat
= the type of coat (wrinkled
orsmooth
)Obs
= the number of peas for each combination of colour and coatRatio
= the expected ratio of observations based on genetic theory
- The RData file contains a simple vector called peas. This vector gives the frequency of observation for the various combinations of coat and colour.
You can use these data for tests of goodness of fit (a kind of association test), using Chi squared. You can also summarise the data graphically (e.g. bar charts).
Note that the R-format data only contains the observed frequencies. If you want to conduct a goodness of fit test in R you will have to incorporate the “expected” ratio data in some manner.
Plant species lists
These data give vascular plant species names for samples from 10 sites from a survey in Shropshire, UK.
- Plant species lists.csv – the data are in the S4E2e Archive Excel.zip The first column gives the site name as a simple abbreviation (there are 10). The second column give the scientific name of the species. There are 187 observations in total.
- The RData file contains a
data.frame
calledplrich
. This gives theSite
andSpecies
names in two columns.
The data provide some practice at cross tabulation, using the Pivot Table in Excel or table()
in R. You can re-arrange the data to give a presence/absence table, where 1
= presence of a species at a site and 0
is absence (in fact you will need to do that in order to determine species richness). The S4E2e.RData file contains an object called ps, which has already been tabulated.
Once the data are tabulated you can use them to explore species richness (a measure of diversity). You can also look at similarity (which you can also plot using a dendrogram).
Plant species & watering
These data show the growth (in cm) of two plant species in response to three different watering regimes (low
, hi
and mid
).
- Two way online.xlsx – the data are in the S4E2e Archive Excel.zip The data are in a particular layout that allows Excel to calculate ANOVA. The data are arranged in an “on the ground” layout with samples in separate blocks. There is a separate worksheet containing a completed 2-way ANOVA (see the Exercises support page).
- The RData file contains a
data.frame
calledpw
. This gives the data in recording layout; the response variable isheight
, and the two predictor variables areplant
andwater
.
Use these data for two-way analysis of variance (2-way ANOVA). The analysis is straightforward in R but less so in Excel (see the Exercises support page for an online exercise on computing 2-way ANOVA using Excel). You can also use the data for graphical representation of results (e.g. boxplot or bar chart) as well as general data summary.
Plant species abundance
These data give the abundance of some terrestrial vascular plants at 10 sites in Shropshire, UK. The data are the same as for the Plant species lists dataset but include the abundance information.
- Plant species abundance.csv – the data are in the S4E2e Archive Excel.zip These data are in community layout with the first column giving the species name (scientific binomial) and subsequent columns for each of the 10 sample sites. The data are based on average domin scores (an abundance scale similar to Braun Blanquet) from five quadrats.
- The RData file contains a data.frame called
psa
. This gives the data with rows as species and columns as sites.
You can use these data to explore diversity using Simpson’s or the Shannon index. You can also use these data to look at similarity, which you can also plot using a dendrogram.
Ridge & Furrow meadow
These data show the abundance of meadow buttercup plants in one metre square quadrats in an ancient ridge & furrow meadow in Buckinghamshire, UK.
- ridge furrow.xlsx – the data are in the S4E2e Archive Excel.zip The file gives the data in sample layout, with a column for ridge and one for furrow. The spreadsheet also contains a completed t-test in a separate worksheet.
- The RData file contains the data in four separate objects:
rf1
– gives that data in recording layout, with a column forcount
and one forarea
(Ridge
orFurrow
).rf2
– gives the data in sample layout, with a column forRidge
and one forFurrow
.furrow
– a separatevector
object for the Furrow data sample.ridge
– a separatevector
object for the Ridge data sample.
You can use these data to explore differences between two samples (t-test or U-test). You can also display the results graphically (e.g. bar chart or box-whisker plot). You can use the data for summary statistics (mean, median, IQR etc.), and to look at data distribution.
The data are in different forms in R-format so that you can explore how to use different syntax according to the layout of data you have.
Seashore seaweed
These data show the abundance of some seaweed species on a rocky shore in South Devon, UK. The data are presented using a text-based abundance scale (ACFOR) and give the abundance of five species at 13 transect stations across the shore (the stations are different heights above mean tide height).
- xlsx – the data are in the S4E2e Archive Excel.zip archive. The file contains two worksheets, with one giving an extra table of data where the ACFOR scale is converted to a numerical ordinal scale.
The data are intended to be used as an example of how to use the =VLOOKUP
function in Excel. You use this to replace one item with another. In this case the text-based ordinal scale is replaced by a numerical scale.
Sward height
These data give the height of vegetation (cm) in the sward at three sampling locations in a meadow in Shropshire, UK.
- Sward height.xlsx – the data are in the S4E2e Archive Excel.zip There are three columns, one for each of the samples;
Upper
,Middle
,Lower
. - The RData file contains the data in two
data.frame
objects:sward2
– the data are in recording layout with a column forHeight
(the response variable) and a column forSite
(the predictor variable).sward3
– the data are in sample layout with a column for each of the three sites.
You can use these data for exploring differences between more than two samples, e.g. the Kruskal-Wallis test (non-parametric) or analysis of variance (ANOVA, parametric). You can also use the data to look at data summaries (e.g. median, mean) and graphically (e.g. boxplot of results or histogram of sample distribution).
Tree sizes and month
These data show the size of Sitka spruce trees measured at monthly intervals. The size is determined as the height multiplied by the diameter squared, on a log scale (i.e. log(h x d2)). The data are modified from an original (larger) dataset from within R.
- The RData file contains the data in a
data.frame
object calledtree
. There are two columns,month
andsize
.
Use these data to draw a line plot of spruce growth.
Whitefly
These data show the counts of whitefly (a greenhouse pest) attracted to different coloured sticky traps. Each trap is bi-coloured so the data are in the form of matched pairs.
- xlsx – the data are in the S4E2e Archive Excel.zip archive. There are two columns, one for the count of whitefly on the White side and the corresponding count for the Yellow side.
- The RData file contains the data in a data.frame object called
whitefly
. There are two columns,white
andyellow
.
You can use these data to carry out a matched pairs test (either a t-test or U-test, the latter also called Wilcoxon matched pairs). You can also summarise the results graphically.
Wilcoxon online exercise
These data are in the form of matched pairs (they are fictitious data).
- xlsx – the data are in the S4E2e Archive Excel.zip archive. There are two columns,
A
andB
.
You can use the data for Wilcoxon matched pairs analysis, as well as graphical summary. See the Exercises support page where there is an online exercise in using Excel to carry out a matched pairs test.
Original Edition – Statistics for Ecologists
If you have the original edition of the book then all the data examples from that are incorporated into the new edition. This means you can use the data file from the new edition and have access to all the original data.
The S4E2e Archive Excel.zip file contains the same data as the original edition but in CSV
form rather than TXT
files.
If you want the original data then click this archive: S4E-Edition-1-Data.
My Publications
I have written several books on ecology and data analysis
Register your interest for our Training Courses
We run training courses in data management, visualisation and analysis using Excel and R: The Statistical Programming Environment. Courses will be held at one of our training centres in London. Alternatively we can come to you and provide the training at your workplace. Training Courses are also available via an online platform.
Get In Touch Now
for any information regarding our training courses, publications or help with a data project