Dr. Mark Gardener

Home
About
Home > Publications > Statistics for Ecologists TOC/Outline

Statistics for Ecologists using R and Excel

Data Collection, Exploration, Analysis and Presentation

Available from Pelagic Publishing. Get a 20% discount using the S4E20 code!

Table of Contents/Outline | Example Data Files | Instructions for using the Data Files


PPLogo

Get a 20% discount on "Statistics for Ecologists" when you buy direct from the publisher! Enter the voucher code S4E20 in the shopping basket at Pelagic Publishing.

To see and download data files that accompany the book click here.

Back to top

Statistics for Ecologists: Outline

This is a book about the scientific process and how we apply it to data in ecology. You will learn how to plan for data collection, how to assemble data, how to analyse data and finally how to present the results. The book uses Microsoft Excel and the powerful Open Source R program to carry out data handling as well as producing graphs.

Who this book is for

Students of ecology and environmental science will find this book aimed at them although many other scientists will find the text useful as the principles and data analysis are the same in many disciplines. No prior knowledge is assumed and the reader can develop their skills up to degree level.

What you will learn from this book

  • How to plan ecological projects.
  • How to record and assemble your data.
  • How to use Excel for data analysis and graphs.
  • How to use R for data analysis and graphs.
  • How to carry out a wide range of statistical analyses.
  • How to create professional looking graphs.
  • How to present your results
The book follows the theme of the scientific process and is split into four broad themes:
  • Planning
  • Data Recording
  • Data Exploration
  • Reporting Results

The sections are rather uneven in size and focus on the analysis side somewhat. The section on reporting also covers presentation of analyses (e.g. graphs). Although the emphasis is on ecological work and many of the data examples are of that sort, I hope that other scientists and students of other disciplines will see relevance to what they do.

The following outline covers each chapter of the book.

Chapter 1 Planning | 2 Data Recording | 3 Beginning Data Exploration
4 Looking at Numbers | 5 Which Test? | 6 Using Graphs
7 Tests for Differences | 8 Tests for Correlations | 9 Tests for Association
10 ANOVA | 11 Regression | 12 Reporting results


To see and download data files that accompany the book click here.

Back to top

Chapter 1. Planning

This chapter is about the preparation stages required before starting to collect data or carry out any analyses. The chapter includes notes on planning for data collection and getting appropriate software (that is R and Excel).

1.1 The scientific method

This section outlines the scientific method and provides a framework for all projects and data analysis.

1.2 Types of experiment/project

This section deals with the types of project that could be encountered and provides a framework that allows the reader to characterise a project, which leads to the most appropriate method of analysis.

1.3 Getting data - using a spreadsheet

This brief section highlights the importance of the spreadsheet and especially points out the usefulness of it in relation to pilot studies and as a tool for overview.

1.4 Hypothesis testing

This section introduces the idea of the hypothesis, a subject that will be returned to in chapter 5.

1.5 Data types

This section introduces the different types of data that can be encountered (Interval, Ordinal and Categorical) and gives some examples of ordinal scales (Domin and Braun Blanquet) that can be used in data collection.

1.6 Sampling effort

This section introduces methods of data collection and includes notes on the amount of data to collect as well as quadrat sizes for example. The ideas of random and systematic sampling are introduced. The main purpose of this section is to highlight the importance of your samples being representative.

1.7 Tools of the trade

This brief section highlights the importance of the software tools that will be used.

1.8 The R program

The R program is an important and powerful tool for data analysis. This section shows the reader how to obtain the program and install it on their computer.

1.9 Excel

A spreadsheet is a useful tool as it allows your data to be held in an formal manner that can be shared. The spreadsheet also allows us to carry out various analyses and produce graphs. In this section the main focus is on installation of the Analysis ToolPak in Excel. This tool allows a number of analyses to be carried out more efficiently that using the regular spreadsheet formulae.


To see and download data files that accompany the book click here.

Back to top

Chapter 2. Data recording

This chapter is brief yet important! The arrangement of data is a fundamentally important aspect of data analysis. Get this part right and your subsequent analyses are greatly facilitated. Get this part wrong and you will have to spend a lot of time rearranging data before analysis can be done. The main thrust of this chapter is to introduce the idea of Biological Records and the Biological Recording format. This standard format is very flexible and allows your data to be used for multiple purposes very easily.

2.1 Collecting data - who, what, where, when

This section deals with the basics of Biological Records and what elements should be recorded.

2.2 How to arrange data

The arrangement of data is of fundamental importance as a poor layout will make it hard to extract the information you require. This section shows how to arrange data in the Biological Recording format, which permits the data to be utilized more easily.


To see and download data files that accompany the book click here.

Back to top

Chapter 3. Beginning data exploration - using software tools

This chapter is aimed at getting the reader more familiar with the software that they will use for data analysis, specifically Excel and R.

3.1 Beginning to use R

This section introduces the R program and helps readers get started using this powerful program. The section includes notes on various topics including:

  • Getting Help
  • Basic Maths
  • Inputting Data
  • Summary Statistics
  • Saving Work

By the end of this section the reader should be competent and confident with using R and be prepared for more detailed data analysis using the R interface.

3.2 Manipulating data in a spreadsheet

This section introduces some important aspects of Excel, topics include:

  • Sorting
  • Data Filtering
  • Paste Special
  • File Formats
  • Lookup Tables
  • Pivot Tables

These skills are really important and when combined with Biological Recording format allow data to be utilized easily and flexibly.

3.3 Getting data from Excel into R

This brief section shows how to transfer data from Excel into R. The spreadsheet is really useful as a data storage program and for initial overviews. Although many statistical and graphical analyses can be carried out in Excel the R program is a dedicated data analysis tool; the more complicated the data the more likely it is that you will be using R rather than Excel.


To see and download data files that accompany the book click here.

Back to top

Chapter 4. Exploring data - looking at numbers

This chapter begins the actual process of data analysis. The exploratory methods introduced here are the basics that should be carried out on all data. Methods covered include:

  • Averages
  • Dispersion
  • Confidence Intervals

4.1 Summarising data

This section deals with the idea of the average as a summary of a numerical sample:

  • Mean
  • Median

After some basic introduction the section deals with how to determine averages in Excel and R.

4.2 Distribution

This section deals with the distribution of the data (that is normal or skewed). Specifically the reader is shown how to create Tally plots and Histograms to visualise the data distribution. The reader is shown how to create a histogram using Excel and R. There is also a brief section on producing density plots (using R), which can be used with a histogram to compare two different distributions.

4.3 A numerical value for the distribution

This section looks at measures of dispersion, specifically:

  • Range
  • Quartiles
  • Standard Deviation

There are notes on how to determine these measures using Excel and using R. The idea of the box plot (box-whisker plot) is also introduced here as a useful visual aid. This section ends with some notes on why n-1 is used in calculations of standard deviation.

4.4 Statistical tests for normal distribution

This brief section illustrates one method of testing the assumption that a sample is normally distributed. The Shapiro-Wilk test is shown (using the R program).

4.5 Distribution type

This section begins with a note of what summary statistics should be used with normal or skewed distribution. The rest of the section includes notes on some other statistics, namely:

  • Standard Error
  • Confidence Intervals

These statistics are related to the normal distribution and link in with the idea of hypothesis testing.

4.6 Transforming data

Since the normal distribution is so important it is helpful to coerce data into normal form if is skewed. This section introduces the idea of data transformation and illustrates several common methods including:

  • Logarithmic
  • Square Root
  • Arcsine (also called Angular)
  • Reciprocal

There are some notes on how to do these transformations in Excel and using R.

4.7 When to stop collecting data? The running average

This section introduces the idea of the running mean as a way to determine when the sample size is adequate.

4.8 Statistical symbols

This section illustrates a few of the more commonly encountered statistical symbols, summarized in a handy table.


To see and download data files that accompany the book click here.

Back to top

Chapter 5. Exploring data - which test is right?

This brief chapter shows the reader how to select the most appropriate analytical test for their data. There is also a brief reminder of the idea of the hypothesis.

5.1 Hypothesis testing

This section revisits the idea of the hypothesis as an analytical tool.

5.2 Choosing the correct test

This section guides the reader towards the most appropriate test. The section includes a decision tree that points to the correct section of the book.


To see and download data files that accompany the book click here.

Back to top

Chapter 6. Exploring data - using graphs

Creating a graphical summary of data is vitally important. This chapter introduces the types of graph most suitable. Chapter 12 deals with the details of producing graphs (using Excel and R).

6.1 Exploratory graphs

This section shows the kind of graphical summary most useful in visualizing data, including:

  • Stem Leaf plot
  • Histogram
  • Density plot
  • Box-Whisker plot

These graphs would generally be used to determine the distribution of the data sample(s).

6.2 Graphs to illustrate differences

This section shows the most useful graphs to illustrate differences:

  • Box-Whisker plot
  • Bar chart

6.3 Graphs to illustrate links

This section shows the kinds of graph used when looking at links:

  • Scatter plots
  • Pie charts
  • Multiple category bar charts

Scatter plots are used for correlations and pie charts for associations (the results of chi squared tests). The bar chart is shown as an alternative to the pie chart for association analysis.

6.4 Graphs - a summary

This section summarizes the use of graphs and includes a handy table for reference.


To see and download data files that accompany the book click here.

Back to top

Chapter 7. Tests for differences

This chapter examines the basic tests for differences between two samples, namely:

  • Student's t-test
  • Mann-Whitney U-test (a.k.a. Wilcoxon Signed Rank test)

These underpin many of the more complicated tests and are important building blocks for further analysis.

7.1 Differences: t-test

The t-test is an important analytical tool that uses the properties of the normal distribution to make decisions about differences between two sample means. In this section the t-test is introduced with a little background/theory. A table of critical values for the t-test is provided.

The way to use the t-test in both Excel and R is shown. There is also a section showing how to use the Analysis ToolPak in Excel to carry out the t-test.

7.2 Differences: U-test

The U-test is a non-parametric test, that is it is used when the sample data are skewed and do not form the normal distribution. The U-test compares two sample medians.

The U-test is introduced with a little background/theory and its use in R is shown. Excel cannot carry out a U-test easily although some tips are shown. A table of critical values for the U-test is provided.

7.3 Paired tests

When data are in the form of matched pairs it is possible to use a special version of the t-test or U-test. Both tests are illustrated with an example.

For normally distributed data the paired t-test is shown. For skewed data the Wilcoxon matched pairs test is shown. A table of critical values for the Wilcoxon test is provided (the t-test table is the same as for the regular t-test). Paired tests can be carried out in R and this is illustrated. Excel can carry out the t-test but not the Wilcoxon test.


To see and download data files that accompany the book click here.

Back to top

Chapter 8. Tests for linking data - correlations

This chapter looks at links between data, namely correlation. The basic principles of correlation are illustrated using a non-parametric method (Spearman Rank) and parametric (normal distribution: Pearson product moment). The idea of correlation is extended to include curvilinear correlation, which is simply an extension of regular regression/correlation. Use of Excel and R for carrying out correlation is shown.

8.1 Correlation: Spearman's rank test

The Spearman's Rank correlation test examines the link between two variables that are not normally distributed. The test is described with some background theory and an example. A table of critical values for the Spearman rank coefficient is provided.

8.2 Pearson's product moment

Pearson's product moment is used when the data are normally distributed. The test is described and a table of critical values is provided. This method of correlation is also known as regression and the principles apply to more complicated situations where there are more than two variables to compare; this is covered in chapter 11 (multiple regression).

8.3 Correlation tests using Excel

Excel is able to carry out Pearson correlation and this is described in the text (using basic functions as well as the Analysis ToolPak). There are no in-built functions to carry out Spearman's rank test in Excel.

8.4 Correlation tests using R

R can carry out a range of correlation tests including Spearman's rank and Pearson product moment. Both of these are described in the text.

8.5 Curved linear correlation

Curved linear correlation/regression is simply an extension of the regular correlation. Two examples of curvilinear correlation are described:

  • Logarithmic correlation
  • Polynomial correlation

These situations arise fairly commonly in natural science. The situations are described only briefly in this section but they are covered in greater depth in chapter 11.


To see and download data files that accompany the book click here.

Back to top

Chapter 9. Tests for linking data - associations

This chapter deals with tests of association, specifically variations on the chi squared test. These tests use data that is categorical. The chapter deals with the basic chi squared test as well as goodness of fit testing, where you match one set of categories with another. How to carry out the tests in both Excel and R is covered.

9.1 Association: Chi-squared test

When you have two sets of categories you can examine for associations using the chi squared test. This section deals with the chi squared test in general with a worked example. When you have a 2 x 2 contingency table the Yates correction can be used and this is also described. A table of critical values for the chi squared statistic is provided.

The text also describes how to determine Pearson residuals, which are useful in presenting and interpreting results of chi squared tests of association.

9.2 Goodness of fit test

If you have two sets of categorical data you can match them using a goodness of fit test. The test is illustrated using some genetic data, a classic use of the goodness of fit test, where you compare the offspring of pea plants to the theoretical ratio expected under genetic theory.

9.3 Using R for Chi-squared tests

The use of R in carrying out all the chi squared and goodness of fit tests is illustrated.

9.4 Using Excel for Chi-squared tests

The use of Excel in carrying out chi squared and goodness of fit tests is illustrated.


To see and download data files that accompany the book click here.

Back to top

Chapter 10. Differences between more than two samples

When you have more than two samples to compare you will need a more complicated analytical approach. This chapter covers the two main methods of analysis:

  • Analysis of Variance (ANOVA)
  • Kruskal-Wallis test

ANOVA is used when you have normally distributed data. When the data are not normally distributed the Kruskal-Wallis test is used. Use of both Excel and R is illustrated in the text.

10.1 Using R for more complex statistical analyses

R is able to carry out very complex analyses quite easily but Excel is limited. This section provides the reader with more skills in using R to carry out these more complicated analyses.

10.2 Analysis of variance

ANOVA allows you to compare more than two samples. When you have a single variable to compare the situation is called one-way ANOVA. However, you may have more than one variable and two-way ANOVA (or more) is possible. This section looks at a range of options when using ANOVA including:

  • One-way ANOVA
  • Post-Hoc testing
  • Two-way ANOVA

ANOVA is described in general and then the calculations are described for both R and Excel. Use of the Analysis ToolPak is also described for one or two-way ANOVA.

10.3 Kruskal-Wallis test

If your data are not normally distributed then the Kruskal-Wallis test is suitable in lieu of 1-way ANOVA. This is described in the text as well as a method of post-hoc testing. The use of R is described for the Kruskal-Wallis test. Excel is unable to carry out the test.


To see and download data files that accompany the book click here.

Back to top

Chapter 11. Tests for linking several factors

When you have several variables to correlate you need a more complicated analytical tool. Multiple regression is the one you require; this uses the properties of the normal distribution. In this chapter multiple regression is described in detail. Curved linear regression is also described (this was first mentioned in chapter 8).

11.1 Multiple regression

Various aspects of multiple regression are described, including:

  • Multiple regression
  • Lines of best-fit
  • Model building and stepwise regression

Multiple regression is introduced and illustrated using both Excel and R. The use of R is extended by demonstrating how to carry out stepwise regression - this is a method of building the most appropriate regression for your data.

11.2 Curved linear regression

Curved linear regression is demonstrated using two examples:

Polynomial regression

Logarithmic regression

Curvilinear regression is described using both Excel and R.

Logistic regression

Logistic regression is another form of regression and is used when you have binary data (e.g. presence-absence). Excel cannot easily carry out logistic regression but R can do this and this is illustrated using two different examples. Logistic regression is also known as Generalized Linear Modelling (GLM).


To see and download data files that accompany the book click here.

Back to top

Chapter 12. Reporting results

This chapter is concerned with the reporting of results. There are sections covering some of the conventions for reporting of statistical tests as well as notes about writing reports. Much of the chapter is devoted to the presentation of graphical results and there are sections detailing how to produce graphs using R and Excel.

12.1 Presenting findings

This brief section gives some ideas about the kinds of presentation namely:

  • A written report
  • A talk
  • A poster

12.2 Publishing

This brief section gives some ideas for places that your work might be published.

12.3 Reporting results of statistical analyses

This section illustrates how to present the results of your statistical analyses.

12.4 Graphs

This section shows a range of graph types and illustrates their uses. There are also sections covering use of graphics using R and Excel.

Graphs in R

This section shows the reader how to unlock the potential of R to produce a range of basic graphs:

  • Scatter plots
  • Bar charts
  • Box-Whisker plots
  • Pie charts

Graphs in Excel

This section covers the use of Excel for producing graphs. There are step-by-step guides to the following graphs:

  • Bar charts (and error bars)
  • Box-Whisker plots
  • Scatter plots
  • Pie charts

12.5 More about graphs in R

This section takes the production of graphs in R to a new level. There are instructions on a range of topics including:

  • Bar charts
  • Multiple category bar charts (stacked or unstacked)
  • Horizontal bar charts
  • Histograms
  • Box-Whisker plots (regular or horizontal)
  • Scatter plots and lines of best-fit
  • Pie charts
  • Line plots

12.6 Worked example graph data in R

This section covers a range of graphs using R and illustrated with worked examples. There are notes on customizing graphs including:

  • Add error bars
  • Alter plotting symbols
  • Adding axis labels
  • Altering graph colours

12.7 Graphs: a summary

This brief summary acts as a reminder of the principal graph types and their uses.

12.8 Writing papers

This brief section introduces the elements of a scientific report.

12.9 Plagiarism

This section provides some guidance about avoiding plagiarism.

12.10 References

This section provides some notes about the use and formatting of references.

12.11 Poster presentations

This section provides some notes about the use of posters as a means to disseminate your results.

12.12 Giving a talk (PowerPoint)

This section provides a few notes about giving a talk about your results.


Click to view or right click to download data examples:

Data files to accompany the book
(view instructions)

Pivot.txt | Beetle size.txt | Beetle comparison.txt | Leaf sizes.txt | Ridge Furrow.txt | Paired data.txt
Correlation.txt | Pearson.txtPolynomial.txt | Logarithmic.txt | Chi Sq.txt | Goodness of Fit.txt
Three sites.txt | Twoway anova xl.txt | Twoway anova br.txt | Hoglouse Three sites.txt | Regression.txt
Post pivot.txt | Pie chart.txt | Mayfly regression.txt | Beach hoppers.txt | Fly data.txt
All files as a ZIP archive: S4EData.zip


Publications
Top Home
Publications
Contact GardenersOwn Homepage