Using R for the t-test

Exercise 7.1.2.

Statistics for Ecologists (Edition 2) Exercise 7.1.2

This exercise is concerned with how to carry out the t-test (Chapter 7) using R (Section 7.1.2).

Using R for the t-test

Introduction

The t-test is used to compare the means of two samples that have a normal (parametric or Gaussian) distribution. The t.test() command carries out the t-test in R. The default is to compute the Welch two-sample test (unequal variances).

You can have your data in several forms:

  • Two separate samples as two data vectors.
  • Two separate samples but in a single data.frame object (i.e. sample format).
  • A response variable and a predictor variable (i.e. scientific recording format)

In any event you can use the t.test() command to carry out the t-test. The example data for this exercise are the same as in the book and you can get the data in the three forms as an RData file: ridge furrow.RData.

Once you have the data you can type the commands shown here for yourself and so follow along.

Separate data objects

When you have two separate samples (probably as vector objects), you can just name them in the t.test() command.

ridge ; furrow
[1] 4 3 5 6 8 6 5 7
[1]  9  8 10  6  7

t.test(ridge, furrow)

Welch Two Sample t-test

data:  ridge and furrow
t = -2.7584, df = 8.733, p-value = 0.02279
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -4.5598309 -0.4401691
sample estimates:
mean of x mean of y
      5.5       8.0

The default carries out the Welch two-sample test (with modified degrees of freedom).

To carry out a t-test with the assumption that the variances are equal, you need to add var.equal = TRUE like so:

t.test(ridge, furrow, var.equal = TRUE)

The result gives a slightly different value for t, df and p-value.

Sample format

If your data are separate samples but contained within a data.frame, you’ll need to alter your approach very slightly so that you can “get at” the variables in the data.frame.

There are three main ways:

  • Use $ syntax to specify the frame and sample name explicitly.
  • Use attach() to place the variables in the search path.
  • Use with() to open the frame temporarily.

Here are the example data:

rf2
  Ridge Furrow
1     4      9
2     3      8
3     5     10
4     6      6
5     8      7
6     6     NA
7     5     NA
8     7     NA

Note that the shorter sample is padded with NA items.

Use $ syntax

You can specify a variable by using the name of the enclosing object, a $ and the variable name:

t.test(rf2$Ridge, rf2$Furrow)

Welch Two Sample t-test

data: rf2$Ridge and rf2$Furrow
t = -2.7584, df = 8.733, p-value = 0.02279

Note that R presents the name exactly as you typed it in the command.

Use attach()

If you try to use a variable that is “inside” a data.frame you get an error:

Ridge
Error: object ‘Ridge’ not found

One way around this is to use attach() to “open” the data.frame and allow the separate variables to be found in the search path. Once you have attached an object its contents appear when you use the search() command and can be used without needing the $ syntax.

Type search() to see the current search path (this is an example):

search()
[1] ".GlobalEnv"        "tools:RGUI"        "package:stats"
[4] "package:graphics"  "package:grDevices" "package:utils"
[7] "package:datasets"  "package:methods"   "Autoloads"
[10] "package:base"

Use attach() to open the data object you want:

attach(rf2)

The rf2 object now appears in the search path:

search()
[1] ".GlobalEnv"        "rf2"               "tools:RGUI"
[4] "package:stats"     "package:graphics"  "package:grDevices"
[7] "package:utils"     "package:datasets"  "package:methods"
[10] "Autoloads"        "package:base"

Now you can use the variables within rf2 in your t-test:

t.test(Ridge, Furrow)

Welch Two Sample t-test

data: Ridge and Furrow
t = -2.7584, df = 8.733, p-value = 0.02279

Note that R presents the names exactly as you typed them.

You should use detach() after you are done. This removes the item from the search() path. You can get confusion if you have data objects with the same name as those contained within data.frames.

detach(rf2)

The attach() command will not overwrite any data objects, but if you open a data.frame and it contains items with the same names as existing objects, the attach()ed ones mask the others until you use detach().

Use with()

The attach() command is useful but you do need to be careful to use detach() after you are done. An alternative approach is to use the with() command, which acts like attach() but only for the duration of one command line.

with(data.name, ...)

So, you give the command the name of the data object you want to “open”, followed by the command you want to execute. In that command you can give the variable names as they are and don’t need the $ syntax.

with(rf2, t.test(Ridge, Furrow, var.equal = TRUE)

In the example the variance is considered equal.

So, you don’t have to use detach() afterwards.

Recording format

If your data are in scientific recording format, then you’ll have the data in a different form from that shown previously (sample format). You will have response variables and predictor variables. For a t-test you will have one response and one predictor e.g.

rf1
   count   area
1      4  Ridge
2      3  Ridge
3      5  Ridge
4      6  Ridge
5      8  Ridge
6      6  Ridge
7      5  Ridge
8      7  Ridge
9      9 Furrow
10     8 Furrow
11    10 Furrow
12     6 Furrow
13     7 Furrow

This layout is more flexible than sample format but you need a slightly different way to specify the variables in your t-test. Essentially you give a formula of form: y ~ x where y is the response (count) and x is the predictor (area). You can also give the name of the enclosing data object.

t.test(count ~ area, data = rf1)

Welch Two Sample t-test

data:  count by area
t = 2.7584, df = 8.733, p-value = 0.02279

Note that R presents the names of the variables as they appear in the enclosing data object.

The formula syntax is very powerful and is used in many statistical and graphical commands. You can extend the formula for more complicated scenarios, such as analysis of variance and multiple regression.

Comments are closed.