Statistics for Ecologists – Outline and Table of Contents

Welcome to the support pages for our book, Statistics for Ecologists Using R and Excel.

These pages provide information and support material for the book. You will find an outline and table of contents as well as support datafiles and additional material.

Buy this book online

What is the subject of this book?

This is a book about the scientific process and how you apply it to data in ecology. You will learn how to plan for data collection, how to assemble data, how to analyze data and finally how to present the results. The book uses Microsoft Excel and the powerful Open Source R program to carry out data handling as well as producing graphs.Statistics help you make sense of data, which is generated in all branches of science. Ecology is a wide-ranging and important science, which helps our understanding of the natural world.

Who this book is for

Students of ecology and environmental science will find this book aimed at them although many other scientists will find the text useful as the principles and data analysis are the same in many disciplines. No prior knowledge is assumed and the reader can develop their skills up to degree level.

What you will learn in this book

This is a book about the scientific process and how you apply it to data in ecology. You will learn how to plan for data collection, how to assemble data, how to analyze data and finally how to present the results. The book uses Microsoft Excel and the powerful Open Source R program to carry out data handling as well as producing graphs. Specific topics include:

  • How to plan ecological projects.
  • How to record and assemble your data.
  • How to use Excel for data analysis and graphs.
  • How to use R for data analysis and graphs.
  • How to carry out a wide range of statistical analyses including analysis of variance and regression.
  • How to create professional looking graphs.
  • How to present your results.

What's new in edition two

The changes from the first edition can be summarized like so:

  • Completely revised chapter on graphics. The chapter is now a one-stop resource for all graphics related topics.
  • New: graph types and their uses.
  • New: Excel Chart Tools.
  • New: R graphics commands.
  • New: producing different chart types in Excel and in R.
  • More support material online, including; example data, exercises and additional notes and explanations.
  • New: chapter on basic community statistics, biodiversity and similarity.
  • New: chapter summaries.
  • New: end of chapter exercises.

How this book is arranged

The book is broadly laid out in four sections, roughly corresponding to the topics:

  • Planning.
  • Recording.
  • Analysing.
  • Reporting.

The sections are rather unequal in length, with the focus on the analysis chapters and production of graphics. Throughout the book you will see example exercises that are intended for you to try out. In fact, they are expressly aimed at helping you on a practical level; reading how to do something is fine but you need to do it for yourself to learn it properly. The Have a Go exercises are hard to miss.

Table of Contents and Outline

Chapter 1. Planning

This chapter is about the preparation stages required before starting to collect data or carry out any analyses. The chapter includes notes on planning for data collection and getting appropriate software (that is R and Excel).

1.1 The scientific method

This section outlines the scientific method and provides a framework for all projects and data analysis.

1.2 Types of experiment/project

This section deals with the types of project that could be encountered and provides a framework that allows the reader to characterise a project, which leads to the most appropriate method of analysis.

1.3 Getting data - using a spreadsheet

This brief section highlights the importance of the spreadsheet and especially points out the usefulness of it in relation to pilot studies and as a tool for overview.

1.4 Hypothesis testing

This section introduces the idea of the hypothesis, a subject that will be returned to in chapter 5.

1.5 Data types

This section introduces the different types of data that can be encountered (Interval, Ordinal and Categorical) and gives some examples of ordinal scales (Domin and Braun Blanquet) that can be used in data collection.

1.6 Sampling effort

This section introduces methods of data collection and includes notes on the amount of data to collect as well as quadrat sizes for example. The ideas of random and systematic sampling are introduced. The main purpose of this section is to highlight the importance of your samples being representative.

1.7 Tools of the trade

This brief section highlights the importance of the software tools that will be used.

1.8 The R program

The R program is an important and powerful tool for data analysis. This section shows the reader how to obtain the program and install it on their computer.

1.9 Excel

A spreadsheet is a useful tool as it allows your data to be held in an formal manner that can be shared. The spreadsheet also allows us to carry out various analyses and produce graphs. In this section the main focus is on installation of the Analysis ToolPak in Excel. This tool allows a number of analyses to be carried out more efficiently that using the regular spreadsheet formulae.

Exercises

Some self-assessment questions (answers in the appendix).

Summary

A summary of the main topics covered in the chapter. Provides a quick summary/overview.

Chapter 2. Data recording

This chapter is brief yet important! The arrangement of data is a fundamentally important aspect of data analysis. Get this part right and your subsequent analyses are greatly facilitated. Get this part wrong and you will have to spend a lot of time rearranging data before analysis can be done. The main thrust of this chapter is to introduce the idea of Biological Records and the Biological Recording format. This standard format is very flexible and allows your data to be used for multiple purposes very easily.

2.1 Collecting data - who, what, where, when

This section deals with the basics of Biological Records and what elements should be recorded.

2.2 How to arrange data

The arrangement of data is of fundamental importance as a poor layout will make it hard to extract the information you require. This section shows how to arrange data in the Biological Recording format, which permits the data to be utilized more easily.

Exercises

Some self-assessment questions (answers in the appendix).

Summary

A summary of the main topics covered in the chapter. Provides a quick summary/overview.

Chapter 3. Beginning data exploration – using software tools

This chapter is aimed at getting the reader more familiar with the software that they will use for data analysis, specifically Excel and R.

3.1 Beginning to use R

This section introduces the R program and helps readers get started using this powerful program. The section includes notes on various topics including:Getting HelpBasic MathsInputting DataSummary StatisticsSaving WorkBy the end of this section the reader should be competent and confident with using R and be prepared for more detailed data analysis using the R interface.

3.2 Manipulating data in a spreadsheet

This section introduces some important aspects of Excel, topics include:SortingData FilteringPaste SpecialFile FormatsLookup TablesPivot TablesThese skills are really important and when combined with Biological Recording format allow data to be utilized easily and flexibly.

3.3 Getting data from Excel into R

This brief section shows how to transfer data from Excel into R. The spreadsheet is really useful as a data storage program and for initial overviews. Although many statistical and graphical analyses can be carried out in Excel the R program is a dedicated data analysis tool; the more complicated the data the more likely it is that you will be using R rather than Excel.

Exercises

Some self-assessment questions (answers in the appendix).

Summary

A summary of the main topics covered in the chapter. Provides a quick summary/overview.

Chapter 4. Exploring data – looking at numbers

This chapter begins the actual process of data analysis. The exploratory methods introduced here are the basics that should be carried out on all data. Methods covered include:AveragesDispersionConfidence Intervals

4.1 Summarising data

This section deals with the idea of the average as a summary of a numerical sample:MeanMedianAfter some basic introduction the section deals with how to determine averages in Excel and R.

4.2 Distribution

This section deals with the distribution of the data (that is normal or skewed). Specifically the reader is shown how to create Tally plots and Histograms to visualise the data distribution. The reader is shown how to create a histogram using Excel and R. There is also a brief section on producing density plots (using R), which can be used with a histogram to compare two different distributions.

4.3 A numerical value for the distribution

This section looks at measures of dispersion, specifically:RangeQuartilesStandard Deviation.There are notes on how to determine these measures using Excel and using R. The idea of the box plot (box-whisker plot) is also introduced here as a useful visual aid. This section ends with some notes on why n-1 is used in calculations of standard deviation.

4.4 Statistical tests for normal distribution

This brief section illustrates one method of testing the assumption that a sample is normally distributed. The Shapiro-Wilk test is shown (using the R program).

4.5 Distribution type

This section begins with a note of what summary statistics should be used with normal or skewed distribution. The rest of the section includes notes on some other statistics, namely:Standard ErrorConfidence IntervalsThese statistics are related to the normal distribution and link in with the idea of hypothesis testing.

4.6 Transforming data

Since the normal distribution is so important it is helpful to coerce data into normal form if is skewed. This section introduces the idea of data transformation and illustrates several common methods including:LogarithmicSquare RootArcsine (also called Angular)ReciprocalThere are some notes on how to do these transformations in Excel and using R.

4.7 When to stop collecting data? The running average

This section introduces the idea of the running mean as a way to determine when the sample size is adequate.

Statistical symbols

This section illustrates a few of the more commonly encountered statistical symbols, summarized in a handy table.

Exercises

Some self-assessment questions (answers in the appendix).

Chapter 5. Exploring data – which test is right?

This brief chapter shows the reader how to select the most appropriate analytical test for their data. There is also a brief reminder of the idea of the hypothesis.

5.1 Types of project

It can be helpful to think about the type of project you are undertaking, as this can help guide you towards the most appropriate method(s) of analysis. In this section you’ll see the different sorts of potential project and the kinds of analysis that might be suitable.

5.2 Hypothesis testing

This section revisits the idea of the hypothesis as an analytical tool.

5.3 Choosing the correct test

This section guides the reader towards the most appropriate test. The section includes a decision tree that points to the correct section of the book.

Exercises

Some self-assessment questions (answers in the appendix).

Summary

A summary of the main topics covered in the chapter. Provides a quick summary/overview.

Chapter 6. Exploring data – using graphs

This chapter has been revised extensively from the original version. It is now a complete overview of graphical presentation and summary of data. You’ll see how to produce the best sort of graph for the job at hand, using examples in both Excel and R. Other chapters incorporate graphics as required but this chapter forms the general foundation, which you can use as a basic reference.

6.1 Introduction to data visualization

This section provides an overview of graph types, showing which sort is best for which task. There are also summaries about how to create graphs in Excel and in R, with the basics to get you started. There are notes about the Chart Tools in Excel and the basic R commands that help you produce and edit graphs.

6.2 Exploratory graphs

This section shows the kind of graphical summary most useful in visualizing data, including:Stem Leaf plotHistogramDensity plotBox-Whisker plotThese graphs would generally be used to determine the distribution of the data sample(s).

6.3 Graphs to illustrate differences

This section shows the most useful graphs to illustrate differences:Box-Whisker plotBar chartThere are also notes about using legends, which are especially useful for multiple category bar charts.

6.4 Graphs to illustrate correlation and regression

This section focusses on the sorts of graph used for illustrating correlations, that is scatter plots. Line plots are also covered, they aren’t exactly used for correlation but show changes over fixed (time) periods, and so fit best in this section.

6.5 Graphs to illustrate association

This section shows the kinds of graph used when looking at associations:Pie chartsMultiple category bar chartsPie charts are commonly used for displaying compositional data. A better alternative is a bar chart. There are notes about how to display the results if chi squared tests of association using R and Excel.

6.6 Graphs to illustrate similarity

This section deals with the sorts of graph used to illustrate similarity between samples. These are dendrograms, and are used in community ecology (see Chapter 12).

6.7 Graphs – a summary

The chapter ends with a brief summary, which includes…

Exercises

Some self-assessment questions (answers in the appendix).

Summary

A summary of the main topics covered in the chapter. Provides a quick summary/overview.

Chapter 7. Tests for differences

This chapter examines the basic tests for differences between two samples, namely:Student’s t-testMann-Whitney U-test (a.k.a. Wilcoxon Signed Rank test)These underpin many of the more complicated tests and are important building blocks for further analysis.

7.1 Differences: t-test

The t-test is an important analytical tool that uses the properties of the normal distribution to make decisions about differences between two sample means. In this section the t-test is introduced with a little background/theory. A table of critical values for the t-test is provided (with a copy in the appendix).The way to use the t-test in both Excel and R is shown. There is also a section showing how to use the Analysis ToolPak in Excel to carry out the t-test.

7.2 Differences: U-test

The U-test is a non-parametric test, that is it is used when the sample data are skewed and do not form the normal distribution. The U-test compares two sample medians.The U-test is introduced with a little background/theory and its use in R is shown. Excel cannot carry out a U-test easily although some tips are shown. A table of critical values for the U-test is provided (with a copy in the appendix).

7.3 Paired tests

When data are in the form of matched pairs it is possible to use a special version of the t-test or U-test (according to the distribution of the data, see Chapter 4). Both tests are illustrated with examples.For normally distributed data the paired t-test is shown. For skewed data the Wilcoxon matched pairs test is shown. A table of critical values for the Wilcoxon test is provided (the t-test table is the same as for the regular t-test). Paired tests can be carried out in R and this is illustrated. Excel can carry out the t-test but not the Wilcoxon test.

Exercises

Some self-assessment questions (answers in the appendix).

Summary

A summary of the main topics covered in the chapter. Provides a quick summary/overview.

Chapter 8. Tests for linking data – correlations

This chapter looks at links between data, namely correlation. The basic principles of correlation are illustrated using a non-parametric method (Spearman Rank) and parametric (normal distribution: Pearson product moment). The idea of correlation is extended to include curvilinear correlation, which is simply an extension of regular regression/correlation. Use of Excel and R for carrying out correlation is shown with examples.

8.1 Correlation: Spearman’s rank test

The Spearman’s Rank correlation test examines the link between two variables that are not normally distributed. The test is described with some background theory and an example. A table of critical values for the Spearman rank coefficient is provided (with a copy in the appendix).

8.2 Pearson’s product moment

Pearson’s product moment is used when the data are normally distributed (see Chapter 4). The test is described and a table of critical values is provided (with a copy in the appendix). This method of correlation is also known as regression and the principles apply to more complicated situations where there are more than two variables to compare; this is covered in Chapter 11 (multiple regression).

8.3 Correlation tests using Excel

Excel is able to carry out Pearson correlation and this is described in the text (using basic functions as well as the Analysis ToolPak). The text also describes how to add a line of best fit to your scatter plots.There are no in-built functions to carry out Spearman’s rank test in Excel but the text describes how you can carry out the calculations using simple functions.

8.4 Correlation tests using R

R can carry out a range of correlation tests including Spearman’s rank and Pearson product moment. Both of these are described in the text. The text also describes how to use R to add lines of best fit to your scatter plots.

8.5 Curved linear correlation

Curved linear correlation/regression is simply an extension of the regular correlation. Two examples of curvilinear correlation are described:Logarithmic correlationPolynomial correlationThese situations arise fairly commonly in natural science. The situations are described only briefly in this section but they are covered in greater depth in Chapter 11.

Exercises

Some self-assessment questions (answers in the appendix).

Summary

A summary of the main topics covered in the chapter. Provides a quick summary/overview.

Chapter 9. Tests for linking data – associations

This chapter deals with tests of association, specifically variations on the chi squared test. These tests use data that are categorical. The chapter deals with the basic chi squared test as well as goodness of fit testing, where you match one set of categories with another. How to carry out the tests in both Excel and R is covered, with additional material on graphing the results.

9.1 Association: Chi-squared test

When you have two sets of categories you can examine for associations using the chi squared test. This section deals with the chi squared test in general with a worked example. When you have a 2 x 2 contingency table the Yates correction can be used and this is also described. A table of critical values for the chi squared statistic is provided (with a copy in the appendix).The text also describes how to determine Pearson residuals, which are useful in presenting and interpreting results of chi squared tests of association.

9.2 Goodness of fit test

If you have two sets of categorical data you can match them using a goodness of fit test. The test is illustrated using some genetic data, a classic use of the goodness of fit test, where you compare the offspring of pea plants to the theoretical ratio expected under genetic theory.

9.3 Using R for Chi-squared tests

This section guides you through the processes required to conduct chi squared tests for association and goodness of fit using R. There are also notes and exercises to help you produce graphs that illustrate your results.

9.4 Using Excel for Chi-squared tests

This section guides you through the processes required to conduct chi squared tests for association and goodness of fit using Excel. There are also notes and exercises to help you produce graphs that illustrate your results.

Exercises

Some self-assessment questions (answers in the appendix).

Summary

A summary of the main topics covered in the chapter. Provides a quick summary/overview.

Chapter 10. Differences between more than two samples

When you have more than two samples to compare you will need a more complicated analytical approach. This chapter covers the two main methods of analysis:Analysis of Variance (ANOVA)Kruskal-Wallis testANOVA is used when you have normally distributed data (see Chapter 4). When the data are not normally distributed the Kruskal-Wallis test is used. Use of both Excel and R is illustrated in the text.

10.1 Analysis of variance

ANOVA allows you to compare more than two samples. When you have a single variable to compare the situation is called one-way ANOVA. However, you may have more than one variable and two-way ANOVA (or more) is possible. This section looks at a range of options when using ANOVA including:One-way ANOVAPost-Hoc testingTwo-way ANOVAANOVA is described in general and then the calculations are described for both R and Excel. Use of the Analysis ToolPak Excel add-in is also described for one or two-way ANOVA.There are critical values tables (with copies in the appendix) for the F-distribution and for Q, the Studentized range, which is used in post-hoc testing.There are also some notes about graphing the results of ANOVA for both R and Excel.

10.2 Kruskal–Wallis test

If your data are not normally distributed (see Chapter 4) then the Kruskal-Wallis test is suitable in lieu of 1-way ANOVA. This is described in the text as well as a method of post-hoc testing. Tables of critical values for the Kruskal-Wallis test are quite extensive and there are several versions, for use with different sample sizes. Table of critical values are presented in the support material rather than in the book itself (you can see the critical values tables here).

Excel is unable to carry out the test “automatically” but there are several functions that can help carry out the required calculations and these are described in the text (there is also an exercise that walks you through the necessary steps).

R can conduct Kruskal-Wallis easily and the processes are described in the text with some examples.

Conducting a post-hoc test for Kruskal-Wallis tests is slightly cumbersome. You can find custom R functions for K-W post-hoc testing on the support pages.

Exercises

Some self-assessment questions (answers in the appendix).

Summary

A summary of the main topics covered in the chapter. Provides a quick summary/overview.

Chapter 11. Tests for linking several factors

When you have several variables to correlate you need a more complicated analytical tool. Multiple regression is the one you require; this uses the properties of the normal distribution. In this chapter multiple regression is described in detail. Curved linear regression is also described (this was introduced in Chapter 8).

A special kind of regression is required when your response variable can only have two forms (e.g. present or absent); this is logistic regression (or binomial regression). Logistic regression is far from trivial to undertake in Excel, so it is described in detail using R.

11.1 Multiple regression

Various aspects of multiple regression are described, including:

  • Multiple regression
  • Beta coefficients
  • Lines of best-fit
  • Model building and stepwise regression

Multiple regression is introduced and illustrated using both Excel and R. The use of R is extended by demonstrating how to carry out stepwise regression – this is a method of building the most appropriate regression for your data.

11.2 Curved-linear regression

Curved linear regression is demonstrated using two examples:

  • Polynomial regression
  • Logarithmic regression

Curvilinear regression is described using both Excel and R. There are also notes about graphing the results and how to add curved lines of best fit to your graphs.

11.3 Logistic regression

Logistic regression is another form of regression and is used when you have binary data (e.g. presence-absence). Excel cannot easily carry out logistic regression but R can do this fairly easily and this is illustrated using two different examples. Logistic regression is a form of Generalized Linear Modelling (GLM).

You’ll also see how to build a regression model and how to plot the results as a scatter plot with a line of “best-fit”. There is an online exercise in logistic regression model-building on the support pages.

Exercises

Some self-assessment questions (answers in the appendix).

Summary

A summary of the main topics covered in the chapter. Provides a quick summary/overview.

Chapter 12. Community ecology

This is a new chapter, especially written for the new second edition. In ecology you are often looking at several species at once, that is you are looking at communities. The analytical methods for exploring community data are generally rather more complicated than when dealing with single species. However, there are a couple of useful statistical approaches that can be easily carried out. These are diversity and similarity, which are described in this chapter.

12.1 Diversity

In a general sense the term diversity (or biodiversity) relates to the number of species in a given area (or sample). However, this is only one way to measure biodiversity. In an ecological sense the term diversity covers several methods of analysis. It is therefore important that you state what sort of diversity you are referring to. The measures of diversity you’ll see in this section are:

  • Species Richness – the number of different species in a given area or sample.
  • Indices of diversity – take into account not only the number of species but also their relative abundance.

Two main indices of diversity covered in the text are:

  • Simpson’s index
  • Shannon entropy

You’ll see how to calculate diversity using Excel and R.

Comparing diversity indices can be tricky. The support website contains an exercise in comparing (Shannon) diversity using the Hutcheson t-test. You can download the Excel spreadsheet for use with your own data.

12.2 Similarity

The analysis of similarity does essentially what the name suggests; it compares samples and allows you to see which are most similar to one another. This can be useful when comparing many samples.

This section covers several methods of calculating similarity, using both Excel and R.

Visualising similarity is done using a special sort of graph called a dendrogram. The text shows you how to draw a dendrogram in various ways:

  • By hand
  • Using Excel
  • Using R

The support material includes a walk-through exercise on building a dendrogram using Excel (there is an associated data file).

Exercises

Some self-assessment questions (answers in the appendix).

Summary

A summary of the main topics covered in the chapter. Provides a quick summary/overview.

Chapter 13. Reporting results

The presentation of your work is an important stage in the scientific process. It helps you to move forwards and to determine “what next?” as well as adding to the body of scientific knowledge and helping other researchers in the future.

This chapter is concerned with the reporting of results. There are sections covering some of the conventions for reporting of statistical tests as well as notes about writing reports in various formats.

13.1 Presenting findings

This brief section gives some ideas about the kinds of presentation namely:

  • A written report
  • A talk
  • A poster

13.2 Publishing

This brief section gives some ideas for places that your work might be published. There are many options ranging from a scientific journal to a press release.

13.3 Reporting results of statistical analyses

It is important to present your findings in a manner that can be understood by your peers. If you are presenting results to the general public then you may have to alter the presentation to suit your audience, but you still keep to the conventions used by scientists in displaying results.

This section shows the main conventions used to display the results of statistical analyses.

13.4 Graphs

This section has been heavily revised since the first edition. The “how to” parts have been removed to Chapter 6 and what remains is more of a guide to “good practice”. The section provides an overview/reminder of graph types and their uses and also of the main elements that you should aim to incorporate for best effect.

13.8 Writing papers

The aim of a paper is to disseminate your results as widely as possible. It is all about communication of your scientific endeavour! This section provides a summary of the main elements of a scientific paper. The various elements form a basic framework that applies to more or less all scientific presentations, regardless of the audience. You may place different emphasis on certain elements (or omit them entirely) but you always keep the basic framework in mind. The structure is important because readers need to know where to find certain pieces of information.

13.9 Plagiarism

Plagiarism is a form of stealing. Essentially it involves you setting forth someone’s work and passing it off as if it were your own. Of course you need to use previous knowledge in your work but you need to acknowledge where your knowledge/information came from. This section gives a few pointers about how to avoid plagiarism. The key to avoiding plagiarism is to know how and when to cite references, the subject of the next section.

13.10 References

References come in two parts. There is a bit in the text that essentially says “look at this for information” and a list at the end that gives the original sources. References are important because they allow readers to see where your information came from and helps avoid plagiarism. References are also useful as pointers to information (e.g. to figures and tables in your report).

This section gives the key elements of references and shows some different methods for employing citations in your text.

13.11 Poster presentations

A poster allows you to make a presentation, which is left on display for hours, sometimes days. It can potentially reach hundreds of people because it is hanging around for so long. At meetings there is usually a set session where you stand by your poster and present it to anyone who expresses an interest; otherwise it stands alone.

This section provides some notes about the use of posters as a means to disseminate your results.

13.12 Giving a talk (PowerPoint)

PowerPoint (or equivalent software) is virtually ubiquitous and is familiar to most people. It can be a great tool for presenting information but it can also be used badly! This short section gives a few notes about “best practice” for use of PowerPoint presentations.

Exercises

Some self-assessment questions (answers in the appendix).

Summary

A summary of the main topics covered in the chapter. Provides a quick summary/overview.

Chapter 14. Summary

This is a very brief summary to remind you that there is more to data analysis than just doing statistics.

Glossary

The glossary is a simple list of useful terms alongside a brief explanation.

Appendix

The appendix contains two main sections:

  • Answers to self-assessment questions
  • Tables of critical values

The answers to the questions are set out in chapter order. The critical values tables are copies of those in the main text but presented en masse to form a more useful resource.

My Publications

I have written several books on ecology and data analysis

An Introduction to R
Data Analysis and Visualisation
£35.00
Beginning R: The Statistical
Programming Language
£26.99
Statistics for Ecologists
Using R and Excel
£34.99
The Essential R
Reference
£44.99
Community
Ecology
£39.99
Managing Data
Using Excel
£24.99

Register your interest for our Training Courses

We run training courses in data management, visualisation and analysis using Excel and R: The Statistical Programming Environment. Courses will be held at one of our training centres in London. Alternatively we can come to you and provide the training at your workplace. Training Courses are also available via an online platform.




    Get In Touch Now

    for any information regarding our training courses, publications or help with a data project.