Custom R Commands

On this page find a summary of the custom R commands developed during production of the book. These are mentioned throughout the text and are available as part of the associated data file (see: Support Files). See also the outline and Table of Contents.

Each entry is laid out in a similar manner to standard R-help entries. I’ve included links to other R commands as appropriate. Most of the examples link to data used in the book (see: Support Files).

See the News page for details about updated files, new custom commands and so on.

See our sister site, DataAnalytics: Ecology Matters, for resources for Ecology Students & Teachers. Including: data examples to use for practise and demonstration, and Custom Functions for R: The Statistical Programming Language.

Buy this book online

Custom Commands Index

A | C | D | E | F | H | I | P | Q | R | S

Table: Custom R functions in Community Ecology support file.

aic Model-building: add single terms to a model.
comm_grp Group community data using grouping variable.
Dapp Bootstrap confidence intervals for Simpson’s diversity index for multiple samples.
dist_chi Calculates pairwise chi squared statistics for a community dataset.
diversity_comp Significance of diversity between two samples using permutation.
eacc Estimated species accumulation for multiple sample groups.
entropy_plot Visualize the results from the renyi() or tsallis() commands (in vegan).
fisher_alpha Calculates Fisher’s alpha, a measure of ecological diversity.
fisher_fit Fits Fisher’s logseries to abundance data.
H_boot Bootstrap confidence intervals for ecological diversity indices.
H_bss Significance between diversity index of two samples using bootstrapping.
H_ci bootstrap resampling to estimate confidence intervals for a single sample. It is intended to be used with the apply() command to provide bootstrap confidence intervals for multiple samples.
H_CI computes the confidence interval for the t-test used with Shannon or Simpson’s indices.
H_sig carries out a version of the t-test for either Shannon or Simpson’s indices.
Happ Bootstrap confidence intervals to be computed for Shannon entropy for multiple samples.
Iapp Bootstrap confidence intervals to be computed for Simpson’s Inverse index for multiple samples.
pacc Extrapolated species richness in a species pool.
plot_H Point plot of diversity index and bootstrap confidence intervals.
polar.ord Polar Ordination (aka Bray-Curtis ordination).
qstar Plot Tsallis entropy profile using q*.
rad_aic Compute AIC values from multiple RAD models.
rad_test A significance test of differences in deviance of RAD models.
spec.rich Species richness for a community dataset.
species_assoc Pairwise chi squared species association.
species_ind Indicator species analysis by chi-squared.
sr_est Species richness estimation with accumulating sites.

A

Model-building: add single terms to a model.

aic

This command is a wrapper for add1(). It computes all the single terms that can be added to the model and returns AIC values.

Usage

aic(model, scope, decreasing = FALSE, n = "all", ...)

Arguments

model a fitted model object.
scope a formula giving the terms to be considered for adding.
decreasing logical. The order to display the results decreasing = FALSE (default) shows results in order of increasing AIC.
n how many variables to display (default: n = "all"). Use integer value to return n items.
... additional parameters to pass to add1().

Details

This command is a wrapper for add1(). It computes all the single terms in the scope argument that can be added to the model, fit those models and compute a table of the changes in fit. The add1() command produces results with variables listed in alphabetical order, whilst aic() allows results to be ordered by AIC value. Additionally, you can specify the number of variables to display.

Value

An object of class "anova" summarizing the differences in fit between the models.

See Also

The add1() command in the stats package.

Examples

> library(vegan) # Load vegan package

> data(varespec) # Biological dataset
> data(varechem) # Environmental variables
> H <- diversity(varespec, index = "shannon") # Shannon index for samples
> vlm <- lm(H ~ 1, data = varechem) # A blank model
> aic(vlm, scope = varechem, n = 3, test = "F") # display first 3 lowest AIC values

Single term additions
Model:
H ~ 1
Df Sum of Sq    RSS     AIC F value    Pr(>F)
Baresoil  1   1.95786 1.0884 -70.240 39.5746 2.486e-06 ***
Humdepth  1   0.75534 2.2909 -52.378  7.2536   0.01328 *
Fe        1   0.48645 2.5598 -49.715  4.1808   0.05303 .
Signif. codes:  0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Compare to add1() result (not shown)

> add1(vlm, scope = varechem, n = 3, test = "F")

C

Group community data using grouping variable.

comm_grp

This command can be used instead of rowsum() and essentially does the same thing.

Usage

comm_grp(comm, groups, data)

Arguments

comm a community dataset as a data.frame, matrix or table.
groups a grouping variable.
data an optional data.frame where the grouping variable can be found.

Details

The command uses a grouping variable to combine samples in a community dataset.

Value

Returns a matrix with rows as samples grouped by the variable in group. The columns are the species as per the original community dataset.

See Also

The rowsum() command in base, which does exactly the same thing.

Examples

Samples in a community dataset

> rownames(gb.biol)
[1] "E1" "E2" "E3" "E4" "E5" "E6" "G1" "G2" "G3" "G4" "G5" "G6"
[13] "W1" "W2" "W3" "W4" "W5" "W6"

Some of the original community data

> gb.biol[1:5, 1:5]
   Aba.par Acu.dub Ago.afr Ago.ful Ago.mue
E1     388       0       0       0       0
E2     325       0       0       0       0
E3     295       0       0       0       0
E4     350       0       0       0       2
E5     407       0       0       0       1

A grouping variable

> gb.site$Habitat
[1] Edge  Edge  Edge  Edge  Edge  Edge  Grass Grass Grass Grass
[11] Grass Grass Wood  Wood  Wood  Wood  Wood  Wood
Levels: Edge Grass Wood

Group data according to grouping variable

> comm_grp(gb.biol, groups = Habitat, data = gb.site)
      Aba.par Acu.dub Ago.afr Ago.ful Ago.mue Ago.vid Ama.aen
Edge     2146       0       0       0       3       0       0
Grass     709       3      20       3       0       2       2
Wood     2013       0       0       0       0       0       0

D

Bootstrap confidence intervals for Simpson’s index.

Dapp

This is intended to be used with apply(); allowing bootstrap confidence intervals to be computed for multiple samples. The default is 1000 resamples and 95% CI.

Usage

Dapp(x)

Arguments

x a vector of values; a single sample from a community dataset. The intention is to use this as the FUN argument of apply (or similar) without a separate argument.

Details

Use this command as a FUN argument in apply() to return bootstrap confidence intervals for Simpson’s diversity index from multiple samples.

Value

97.5% quantiles as bootstrap confidence intervals (as a simple numeric vector). When used with apply() the result is a matrix with columns corresponding to samples and two rows, one for the 2.5% and one for the 97.5% bootstrap quantile.

See Also

See apply() for applying a function to rows (or columns) of a data.frame or matrix. See H_boot() and H_ci() for computing bootstrap confidence intervals for a single sample using Simpson’s, Shannon or Inverse Simpson’s indices. See Happ() for confidence intervals using Shannon entropy and Iapp() for Simpson’s Inverse index. Bootstrapping is carried out using the boot() command.

Examples

Use on single sample

> DeVries[1,1:6]
Hisache Panprol Neshewi Morachi Taysp.1 Coldirc
   1882    1028      19       5       8     273

> DV.ci <- Dapp(DeVries[1,])
> DV.ci
     2.5%     97.5%
0.8418572 0.8553752

Use on rows of a dataset with multiple samples

> apply(DeVries, MARGIN = 1, FUN = Dapp)
          canopy     under
2.5%  0.8417613 0.9169560
97.5% 0.8557949 0.9224317

Chi-squared association for community analysis

dist_chi

Calculates pairwise chi squared statistics for a community dataset. The result has a class "distchi", which has plot and summary methods. The results include Pearson residuals as a dissimilarity matrix. The plot method produces an hierarchical cluster object, and plots the dendrogram.

Usage

dist_chi(data)

summary(x)

plot(x, method = "complete", ...)

Arguments

data a community dataset (usually a matrix or data.frame) with columns as samples.
x the result of a distchi() command.
method the cluster joining algorithm to be used by hclust(), use one of "ward", "single", "complete" (the default), "average", "mcquitty", "median", or "centroid".
... other parameters to pass to the plotting command.

Details

Chi squared association can be used to analyse species co-occurrence across many samples. Species that tend to associate together will be more likeley to be from the same community. Species that tend to be negatively associated will be form different communities. So, this approach can be used to identify groups of species and communities using data from many samples. This command carries out the association tests and used the Pearson residuals as the basis for a dissimilarity matrix (all residuals are rescaled to make the smallest zero). This is used by the plot method to produce a dendrogram showing the species groupings.

Value

An object of class "distchi", which has various components:

chi The individual chi square values for the pairwise species co-occurrences.
expected The expected values for the pairwise species co-occurrences.
residuals The Pearson residuals for the pairwise species co-occurrences.
distance The dissimilarity matrix based on the Pearson residuals.
p.val The p-value for the overall chi squared association.
data The name of the data used in the analysis.
spp The number of species in the dataset.

See Also

The hclust() command in stats for hierarchical clustering algorithms. See designdist() in vegan for calculating dissimilarities (used for calculating co-occurrence). Use species_assoc() for association analysis of a pair of species from a dataset. See species_ind() for indicator species analysis using chi squared.

Examples

Use a dataset with rows as species and columns as samples

> hsere.cs <- dist_chi(hsereT)
> summary(hsere.cs)

Available components:
[1] "chi"       "expected"  "residuals" "distance"  "p.val"
[6] "data"      "spp"

Total.Chi df         p.val
hsereT  632.9612 22 1.027601e-119

A dendrogram of the result
> plot(hsere.cs, method = "complete")

Diversity index comparison by permutation.

diversity_comp

Computes significance of diversity index between two samples by permutation method (oecosimu). You can use any of the diversity indices used by the diversity() command in vegan.

Usage

diversity_comp(comm, index = "shannon", perm = 1999)

Arguments

comm a community dataset of exactly two samples.
index a diversity index; options are "shannon" (default), "simpson" and "invsimpson".
perm the number of permutations, default perm = 1999.

Details

This command uses the vegan::oecosimu() command and is essentially a wrapper for it. It uses a permutation process to repeatedly sample two communities. The difference in the diversity index is then assessed from the permutation results.

Value

The result is an object that holds two classes: "oecosimu" and "list". There are two components: statistic, the calculated difference between the community indices and oecosimu, the results from the simulation. The latter holds the main results, which can be summarised using the print routine (i.e. simply typing the name of the result). Results include z-statistic, 95% CI and p-value (from simulation).

See Also

The oecosiumu() command in vegan carries out the analysis. See also the diversity() command for calculation of diversity indices. See H_boot() for general confidence intervals of diversity indices by bootstrapping.

You must use a community dataset with exactly two samples. You can easily subset a larger dataset. If you have separate data then you’ll have to combine them first.

Examples

A data.frame with only two rows

> diversity_comp(DeVries, index = "simpson", perm = 2000)

oecosimu with 2000 simulations
simulation method r2dtable
alternative hypothesis: true mean is not equal to the statistic
           statistic          z       2.5%        50%  97.5%
statistic 7.0746e-02 5.5458e+01 6.6880e-05 1.3523e-03 0.0045
           Pr(sim.)
statistic 0.0004998 ***

Use a data.frame with many rows and select two for comparison

> diversity_comp(gb.biol[c(1,7), ])

oecosimu with 1999 simulations
simulation method r2dtable
alternative hypothesis: true mean is not equal to the statistic
          statistic        z      2.5%       50% 97.5% Pr(sim.)
statistic  0.668067 10.135002  0.003174  0.063465 0.218    5e-04

E

Estimated species accumulation.

eacc

This command is intended to be used with lapply() so that you can compute estimated species accumulation for multiple sample groups.

Usage

eacc(x, data)

Arguments

x An index value from a community dataset.
data A community dataset, usually a data.frame or matrix.

Details

This function is intended to be used in lapply() rather than as a standalone command. It allows you to take a community dataset and split it by a grouping variable (an index you define). Then the function carries out estimated species accumulation groupwise, using the index you specify and the vegan::estaccumR() command.

Value

The result is a "list" with a component for each grouping variable specified. Each component contains the result of the estaccumR() command, that is estimated species accumulation using various methods.

See Also

The estaccumR() command carries out the estimated species accumulation using various estimators. The pacc() command is a custom function that carries out a similar job for estimated species richness using the vegan::poolaccum() command. See lapply() for details of applying a function to list elements.

Examples

## Make an index from rows of community dataset
> index <- list(Edge = 1:6, Grass = 7:12, Wood = 13:18)

> index
$Edge
[1] 1 2 3 4 5 6
$Grass
[1]  7  8  9 10 11 12
$Wood
[1] 13 14 15 16 17 18

Use index value to extract species accumulation by group

> lapply(index, FUN = eacc, data = gb.biol)

$Edge
N     S     Chao      ACE
1 18.19 20.49214 21.44664
2 21.47 23.23650 23.67794
3 23.80 25.20936 26.06676
4 25.30 26.48750 27.45683
5 26.21 27.51250 28.03773
6 27.00 28.00000 28.75118

$Grass
N     S     Chao      ACE
1 24.04 31.43000 32.11511
2 29.51 33.98068 36.43205
3 32.47 34.64050 37.42315
4 34.38 35.53885 37.24407
5 35.71 36.55452 37.74246
6 37.00 38.00000 38.72006

$Wood
N     S   Chao      ACE
1 11.45 12.320 14.03674
2 12.18 12.700 13.25899
3 12.59 12.915 13.62097
4 12.89 13.170 13.79830
5 13.00 13.000 13.39019
6 13.00 13.000 13.00000

Plot results from Rényi or Tsallis calculations.

entropy_plot

This command allows you to visualize the results from the renyi() or tsallis() commands (in vegan). Usually you’ll use it as an alternative to the usual lattice plotting command to overlay multiple results in a single plot window.

Usage

entropy_plot(H, type = "l",
             ylab = "Diversity",
             xlab = "Scale",
             legend = TRUE,
             col = palette(),
             lty = 1:6, cex = 1,
             pch = NULL,
             cex.legend = 1, ...)

Arguments

H the result from a renyi() or tsallis() command from the vegan package.
type the type of plot, default "l" produces lines. Alternatives are "p", points, "b", both, "o" overplotted, "n" none.
ylab

xlab

axis labels for y and x-axes
legend Logical. The default, TRUE, displays a legend.
col the default colours to use on the plot, the default uses the current colour palette. Colours are recycled as required.
lty the line style(s) used, the default uses types 1:6 and these are recycled as needed.
cex character expansion for plotted points, default = 1.
pch plotting symbols, default is NULL.
cex.legend character expansion for legend labels, default = 1.
... other graphical parameters to pass to matplot().

Details

This command provides an alternative way to visualise results from Rényi or Tsallis entropy calculations. The command takes the result of a tsallis() or renyi() command (from vegan) and produces a line plot for each sample – showing the entropy at the various scales.

The tsallis() and renyi() commands usually show multiple samples using the lattice package, with separate panel for each sample.

The entropy_plot() command uses matplot() to overlay the various samples in one plot. You can choose to add a legend (defaults to the "topright" position). If the input data is not an entropy result the command stops, with an error. You could mimic the result of a tsallis() or renyi() command, the result must hold a class "renyi".

Value

Creates a new graphic from the result of a tsallis() or renyi() command (from vegan). Each sample is shown as a separate line (without markers by default) showing the entropy at the various scales.

See Also

Entropy calculations in the vegan package by tsallis() and renyi() commands. The matplot() command is used to draw the multiple sample lines.

Examples

## Community dataset with two samples
> rownames(DeVries)
[1] "canopy" "under"

## Make a Tsallis entropy result
> DV.te <- tsallis(DeVries)

## Plot result using lines with points
> entropy_plot(DV.te, pch = 1:2, type = "b")

F

 Ecological diversity: Fisher’s alpha

fisher_alpha

Calculates Fisher’s alpha, a measure of ecological diversity.

Usage

fisher_alpha(x, MARGIN = 1, se = FALSE, ...)

Arguments

x Community data, a matrix-like object or a vector.
MARGIN Margin for which the index is computed. Default 1 = rows.
se With option se = TRUE, function fisher_alpha() returns a data.frame with items for α (alpha), its approximate standard errors (se), residual degrees of freedom (df.residual), and the code returned by nlm on the success of estimation. See Details.
... Other parameters to pass to nlm.

Details

The fisher_alpha() command estimates the α parameter of Fisher’s logarithmic series (see fisher_fit() command). The estimation is possible only for genuine counts of individuals.

This command restores the behaviour of the old command fisher.alpha() in vegan. This is what the vegan team had to say:

“fisherfit: completely rewritten and estimates of standard error removed: I could not find no justification for these. Actually, it seems that the value of Fisher alpha as estimated in the function was independent of the abundance distribution of species, but will be defined by the number of species (S) and number of individuals (N).”

“Now the Fisher alpha is estimated from the relationship S = alpha*(1 + log(N/alpha)) using function uniroot(). Because of this, standard errors cannot be estimated and they were removed. In addition, functions confint.fisherfit, profile.fisherfit and plot.profile.fisherfit were removed. The estimation of standard errors was also removed in function fisher.alpha (that only calls fisherfit).”

So, use with some caution – I replaced the function solely to allow you to get the same results as the exercise in the book.

Value

With option se = TRUE, function fisher_alpha() returns a data.frame with items for α (alpha), its approximate standard errors (se), residual degrees of freedom (df.residual), and the code returned by nlm on the success of estimation.

See Also

Function fisher.alpha() in later versions of package vegan does not compute se. Function fisher_fit() computes the logseries and returns the alpha values. The newer versions of package vegan use fisherfit() which does not permit profiling or calculation of se.

Other commands restored from earlier version of vegan:

confint.fisherfit()

plot.profile.fisherfit()

profile.fisherfit()

Examples

This version computes se

> fisher_alpha(DeVries, se = TRUE)
           alpha       se df.residual code
canopy  8.601215 1.249295          55    1
under  10.212885 1.379707          64    1

Newer version in vegan does not compute se even though help suggests it will

> fisher.alpha(DeVries, se = TRUE)
 canopy    under
8.60122 10.21290

Fit Fisher’s logseries to abundance data

fisher_fit

confint.fisherfit

plot.profile.fisherfit

profile.fisherfit

Function fisher_fit() fits Fisher’s logseries to abundance data.

Usage

fisher_fit(x, ...)

Arguments

x Community data, a matrix-like object or a vector.
... Other parameters to pass to nlm.

Details

Fits Fisher’s logseries to ecological abundance data using non-linear modelling. There is a plot method. In addition you can use confint() (from MASS) to determine approximate confidence intervals. There are also profile and plot.profile methods.

Newer versions of vegan have removed the calculations for se (see Details in fisher_alpha()) so treat these with caution. I only restored the commands to allow you to get the same results as the exercises in the book.

Value

Returns an object of class "fisherfit", which is the result of nlm().

See Also

Function fisher.alpha() in later versions of package vegan does not compute se. Function fisher_fit() computes the logseries and returns the alpha values. The newer versions of package vegan use fisherfit() which does not permit profiling or calculation of se.

Examples

Fit Fisher logseries to sample of data

> ff <- fisher_fit(DeVries[1,])
> ff
Fisher log series model
No. of species: 56
Fisher alpha:   8.601215

Plot the result (not shown)

> plot(ff)

Get approx CI (uses MASS)

> confint(ff)
   2.5 %    97.5 %
6.396521 11.313053

Make a profile result

> profile(ff)
$alpha
          tau  par.vals
1  -3.3649073  5.094402
2  -2.5788594  5.795765
3  -1.8602976  6.497127
4  -1.1967304  7.198490
5  -0.5789605  7.899852
6   0.0000000  8.601215
7   0.5455796  9.302577
8   1.0621142 10.003940
9   1.5531015 10.705303
10  2.0214191 11.406665
11  2.4694633 12.108028
12  2.8992526 12.809390

attr(,"original.fit")
attr(,"original.fit")$coefficients
   alpha
8.601215

attr(,"original.fit")$std.err
[1] 1.249295

attr(,"class")
[1] "profile.fisherfit" "profile.glm"       "profile"

Plot the profile (not shown)

> plot(profile(ff))

H

Bootstrapping diversity indices.

H_boot

Carries out a bootstrap resampling using the boot package. The results compute the confidence intervals. You can specify any diversity index used by the diversity() command in vegan for a single sample but the vegan package is not required.

Usage

H_boot(x, index = "shannon", R = 1000, ci = 95)

Arguments

x a vector of community data.
index a diversity index; use one of "shannon" (default), "simpson" or "invsimpson".
R the number of bootstrap resamples to use, defaults to 1000.
ci the confidence interval to return, the default 95, returns 2.5% and 97.5% quantiles.

Details

A sample of community data can be randomised by repeatedly resampling. The H_boot() command uses the boot() command in the boot package to conduct the resampling. The command allows you to specify various diversity index measures and computes the confidence intervals based on quantiles from the resampling.

Value

Produces a "list" with several components: boot.stats gives the results from the boot() command, CI shows the confidence intervals (based on quantiles from the bootstrap resampling), and index gives the calculated diversity index. The command displays the index and CI directly.

See Also

The boot() command in the boot package carries out the resampling. Diversity indices are computed de-novo but see diversity() in the vegan package for alternative computation of Shannon, Simpson’s and Inverse Simpson’s indices. See H_bss() for bootstrap comparison between two samples.

The H_ci() command can be used with apply() to get bootstrap confidence intervals for multiple samples.

Examples

Use defaults for one sample from a dataset

> H_boot(DeVries[1,])
H =  2.640651
   2.5%    97.5%
2.596435 2.675158

Use different index, CI and number of resamples

> H_boot(DeVries[2,], index = "simpson", ci = 99, R = 2000)
H =  0.9198993
     0.5%     99.5%
0.9158286 0.9233032

Bootstrap significance between two diversity indices.

H_bss

This command carries out a bootstrap analysis of the diversity indices of two samples. The result gives a p-value for the permutation as well as a calculated z-score. You can use any of the diversity indices used by the diversity() command in vegan. The result holds a class, "Hbss", which has print and summary methods.

Usage

H_bss(comm, R = 2000, index = "shannon")

print(x, ...)

summary(x)

Arguments

comm a community dataset containing exactly two samples.
x the result of an H_bss() command.
R the number of bootstrap resamples to use, defaults to 2000.
index a diversity index; use one of "shannon" (default), "simpson" or "invsimpson".
... other parameters to pass to print, such as digits.

Details

The statistical significance of the difference in diversity index between two samples is estimated using a randomization method. The two samples are repeatedly randomized and the difference in the diversity index recalculated. The significance is assessed by comparing the difference between the resamples and the original difference. Confidence intervals are estimated using the quantiles of the randomization.

A z-score is also computed from the randomizations and the p-value estimated from that.

The command produces a result holding a class "Hbss", which has print and summary routines.

Value

A list object with custom class "Hbss", which has print and summary methods. The result object contains several components:

H.orig The calculated diversity index for the two original samples.
statistic The difference in the diversity index between the two original samples.
z.score The z-score calculated using the original difference and the difference between the bootstrap resamples as well as the standard deviation of the bootstrap samples.
CI The 95% confidence interval for the difference between the indices of the two samples.
p.sim The p-value resulting from the bootstrap resampling.
p.z The p-value estimated using the z-score.
df The degrees of freedom (the number of species -1 for each sample).
Index The name of the diversity index used in the calculations.

See Also

See the diversity() command in vegan for computations of diversity indices. See the r2dtable() command for randomization of data tables with fixed marginal totals. See H_boot() for bootstrap confidence intervals of a single sample. See also H_ci() for a function to use with apply() to give boostrap confidence intervals for multiple samples.

Examples

Use 2 samples from a larger dataset

> gb.bs <- H_bss(gb.biol[c(7,13), ])

Print result

> print(gb.bs, digits = 3)
        H.orig.1 H.orig.2 Statistic Z.score    2.5% 97.5% P(sim)
shannon     1.94     1.40     0.538    9.57 0.00335 0.184      0
P(z) df
shannon 1.13e-11 38

Summary method

> summary(gb.bs)
Original shannon Index:  1.935163 1.397251
Difference in index:     0.5379125
z-score:                 9.573475
Confidence interval:     0.003348132 0.1837998
Degrees of Freedom:      38
Simulated P-value:       0
P-value (z):             1.130406e-11

Bootstrap confidence intervals for a single sample.

H_ci

This command carries out a bootstrap resampling to estimate confidence intervals for a single sample. It is intended to be used with the apply() command to provide bootstrap confidence intervals for multiple samples.

Usage

H_ci(x, index = "shannon", R = 1000, ci = 95)

Arguments

x a vector of community data.
index a diversity index; use one of "shannon" (default), "simpson" or "invsimpson".
R the number of bootstrap resamples to use, defaults to 1000.
ci the confidence interval to use, defaults to 95 (i.e. 2.5% and 97.5% quantiles).

Details

A sample of community data can be randomised by repeatedly resampling. The H_ci() command uses the boot() command in the boot package to conduct the resampling. The command allows you to specify various diversity index measures and computes the confidence intervals based on quantiles from the resampling.

This command is intended to be used as the function (FUN argument) in apply(), allowing you to calculate confidence intervals for multiple samples. The results can be used for example, to make error bars in plots of diversity indices for multiple samples.

Value

The command returns (invisibly) the confidence intervals as a simple vector with two values, one for the lower CI and one for the upper CI.

See Also

The H_boot() command for calculating the diversity index and confidence intervals for a single sample. The boot() command (in package boot) carries out the bootstrap resampling. See the H_bss() command for a significance test for two samples based on randomization. See the apply() command for conducting computations repeatedly over the rows (or columns) of a dataset.

Examples

Use a dataset with two rows (samples)

> apply(DeVries, MARGIN = 1, FUN = H_ci, index = "simpson")
         canopy     under
2.5%  0.8418946 0.9170899
97.5% 0.8554198 0.9224856

Confidence interval for Shannon or Simpson’s t-test.

H_CI

This command computes the confidence interval for the t-test used with Shannon or Simpson’s indices.

Usage

H_CI(x, index = "shannon", ci = 95)

Arguments

x a vector of community data.
index a diversity index; use one of "shannon" (default), or "simpson".
ci the confidence interval to use the default is 95 and works out 95% CI.

Details

A variant of the t-test can be used to determine the statistical significance of differences in Shannon entropy or Simpson’s index between two samples. The H_CI() command computes the confidence interval for either index using the t-test formula as a starting point. The t-test is sensitive to degrees of freedom (sample size) and bootstrapping methods are becoming more popular (see H_bss() command).

The command can be used with apply() to get the confidence intervals for multiple samples.

Value

Returns a simple numeric value representing the confidence interval at the chosen level (default 95%).

See Also

The H_sig() command for calculating the t-test to compare Shannon or Simpson’s index between two samples. See the apply() command for using H_CI() over rows of a dataset to calculate CI for multiple samples. See H_ci() for bootstrap confidence intervals for multiple samples and H_bss() for a significance test based on randomization using bootstrapping.

Examples

Get CI for Simpson’s index for single sample

> H_CI(DeVries[1,], index = "simpson")
[1] 0.007075684

Get CI for Shannon for rows of dataset

> apply(DeVries, MARGIN = 1, FUN = H_CI)
    canopy      under
0.07670451 0.09524768

Special t-test for Shannon or Simpson’s index.

H_sig

This command carries out a version of the t-test for either Shannon or Simpson’s indices. The result object has a class "Hsig", which has plot and summary routines.

Usage

H_sig(x, y, index = "shannon", alpha = 0.05)

summary(x)

plot(x, ylab = "Index Value", xlab = "Samples", ...)

Arguments

x, y Vectors of community data for comparison; these can be of different length. For plot and summary methods, x is the result of an H_sig() command.
index a diversity index; use one of "shannon" (default), or "simpson".
alpha the significance level, the default 0.05 computes critical values for p = 0.05.
xlab, ylab labels for the x and y-axes.
... additional graphical parameters to pass to plot().

Details

A variant of the t-test can be used to determine the statistical significance of differences in Shannon entropy or Simpson’s index between two samples. The H_sig() command computes the t-test for two samples. The t-test is sensitive to degrees of freedom (sample size) and bootstrapping methods are becoming more popular (see H_bss() command).

The summary routine provides an overview of the test statistics and the plot routine creates a point plot with error bars based on confidence intervals.

Value

The result is a list with a custom class "Hsig", which has summary and plot methods. There are various components:

p.val The p-value for the calculated t-test.
df The degrees of freedom.
t.statistic The calculated value of t.
var The variance for each sample.
critical The critical value for the degrees of freedom and the chosen alpha level.
H The value of the diversity index for each sample.
index The name of the diversity index used.
alpha The chosen level of significance.
sites The site names.
richness The number of species in each sample.
individuals The number of individuals in each sample.

 

The plot method produces a point plot with error bars showing confidence intervals (using the alpha level from the original t-test).

See Also

The H_CI() command for computing confidence intervals using the t-test formula as a starting point. See the H_bss() command for an alternative significance test using randomization and bootstrapping.

Examples

Determine significance of t-test between Shannon index of 2 samples

DVh <- H_sig(DeVries[1,], DeVries[2,])
[1] 0.0002608309

Get a summary of the results

> summary(DVh)
Site: DeVries[1, ]
Index: shannon = 2.640651
Variance = 0.0003865389
df = 2.231875
No. Species = 56
Individuals = 5774
Site: DeVries[2, ]
Index: shannon = 2.982633
Variance = 0.0002548043
df =  1.508951
No. Species = 65
Individuals = 5922
t-statistic = 13.50384
df = 3.740253
critical value at 0.05 = 2.854166
Significance = 0.0002608309

Plot the result

> plot(DVh, pch = 16, cex = 2, col = "darkgreen", lty = 1)

Bootstrap confidence intervals for the Shannon index.

Happ

This is intended to be used with apply(); allowing bootstrap confidence intervals to be computed for multiple samples. The default is 1000 resamples and 95% CI.

Usage

Happ(x)

Arguments

x a vector of values; a single sample from a community dataset. The intention is to use this as the FUN argument of apply (or similar) without a separate argument.

Details

Use this command as a FUN argument in apply() to return bootstrap confidence intervals for Simpson’s diversity index from multiple samples.

Value

When called on a single sample (a vector of values) the command returns invisibly 2.5% and 97.5% quantiles as bootstrap confidence intervals (as a simple numeric vector). When used with apply() the result is a matrix with columns corresponding to samples and two rows, one for the 2.5% and one for the 97.5% bootstrap quantile.

See Also

See apply() for applying a function to rows (or columns) of a data.frame or matrix. See H_boot() and H_ci() for computing bootstrap confidence intervals for a single sample using Simpson’s, Shannon or Inverse Simpson’s indices. See Dapp() for confidence intervals using Simpson’s index and Iapp() for Simpson’s Inverse Index. Bootstrapping is carried out using the boot() command.

Examples

Use as FUN in apply command

> apply(DeVries, MARGIN = 1, FUN = Happ)
        canopy    under
2.5%  2.595170 2.943928
97.5% 2.674168 3.009593

 Bootstrap confidence intervals for the Inverse Simpson index.

Iapp

This is intended to be used with apply(); allowing bootstrap confidence intervals to be computed for multiple samples. The default is 1000 resamples and 95% CI.

Usage

Iapp(x)

Arguments

x a vector of values; a single sample from a community dataset. The intention is to use this as the FUN argument of apply (or similar) without a separate argument.

Details

Use this command as a FUN argument in apply() to return bootstrap confidence intervals for Inverse Simpson’s diversity index from multiple samples.

Value

When called on a single sample (a vector of values) the command returns invisibly 2.5% and 97.5% quantiles as bootstrap confidence intervals (as a simple numeric vector). When used with apply() the result is a matrix with columns corresponding to samples and two rows, one for the 2.5% and one for the 97.5% bootstrap quantile.

See Also

See apply() for applying a function to rows (or columns) of a data.frame or matrix. See H_boot() and H_ci() for computing bootstrap confidence intervals for a single sample using Simpson’s, Shannon or Inverse Simpson’s indices. See Dapp() for confidence intervals using Simpson’s index and Happ() for Shannon. Bootstrapping is carried out using the boot() command.

Examples

Use as FUN in apply command

> apply(DeVries, MARGIN = 1, FUN = Iapp)
        canopy    under
2.5%  6.333271 12.05552
97.5% 6.950071 12.90245

P

Extrapolated species richness in a species pool.

pacc

This command is intended to be used with lapply() for use with a community dataset and a grouping variable. It calls the poolaccum() command from vegan and returns various estimators of species richness.

Usage

pacc(x, data)

Arguments

x a vector of index values.
data the data.frame holding the community data.

Details

This function is intended to be used in lapply() rather than as a standalone command. It allows you to take a community dataset and split it by a grouping variable (an index you define). Then the function carries out estimated species richness groupwise, using the index you specify and the vegan::poolaccum() command.

Value

The result is a "list" with a component for each grouping variable specified. Each component contains the result of the poolaccum() command, that is estimated species accumulation using various methods.

See Also

The poolaccum() command carries out the estimated species accumulation using various estimators. The eacc() command is a custom function that carries out a similar job for groupwise estimated species richness using the vegan::estaccumR() command. See lapply() for details of applying a function to list elements.

Examples

Make an index from rows of community dataset

index <- list(Edge = 1:6, Grass = 7:12, Wood = 13:18)
> index
$Edge
[1] 1 2 3 4 5 6
$Grass
[1]  7  8  9 10 11 12
$Wood
[1] 13 14 15 16 17 18

Use index to get extrapolated species richness for groups

> lapply(index, FUN = pacc, data = gb.biol)
$Edge
N     S     Chao Jackknife 1 Jackknife 2 Bootstrap
3 23.29 34.94850    27.38333    28.90000  25.22704
4 24.94 33.60742    29.44750    31.33917  27.05879
5 26.13 34.32750    30.99400    33.28750  28.38051
6 27.00 33.00000    32.00000    34.40000  29.30678
$Grass
N     S     Chao Jackknife 1 Jackknife 2 Bootstrap
3 32.21 38.03772    37.81000    39.45167  34.95630
4 34.10 38.48675    39.38750    40.78250  36.74922
5 35.79 43.29300    41.72600    44.46350  38.58681
6 37.00 53.00000    43.66667    47.93333  39.94114
$Wood
N     S   Chao Jackknife 1 Jackknife 2 Bootstrap
3 12.62 13.325    13.47333    13.66500  13.05148
4 12.91 13.160    13.76500    14.07833  13.32430
5 13.00 13.325    13.52000    13.30250  13.31851
6 13.00 13.000    13.00000    11.93333  13.17563

 Point plot of diversity index and bootstrap confidence intervals.

plot_H

This command carries out a bootstrap resampling of a community dataset and plots the results as a point plot that includes error bars showing the 95% confidence intervals. You can use any diversity index used by the diversity() command in vegan.

Usage

plot_H(comm,
       index = "shannon",
       xlab = "Site Names",
       ylab = "Diversity Index",
       pch = 18,
       cex = 1.2,
       cex.axis = 0.75,
       boot = TRUE, ...)

Arguments

comm community dataset, usually a matrix or data.frame.
index a diversity index; use one of "shannon" (default), "simpson" or "invsimpson".
xlab, ylab labels for x and y axes.
pch plotting symbol, default = 18 (a filled diamond).
cex character expansion, default = 1.2.
cex.axis character expansion for axes, default = 0.75.
boot = TRUE logical, use bootstrap CI (default = TRUE) or variance estimators (FALSE).
... other graphical parameters to pass to plot().

Details

Bootstrapping is a way to repeatedly resample a community dataset. The confidence interval of the diversity index is assessed from the quantiles of the resamples. The plot_H() command uses the diversity() command from vegan to calculate the diversity index for multiple samples in a community dataset. The H_ci() command is called internally, which computes the confidence intervals by calling boot() from the boot package. The diversity indices are then plotted (as a point plot), using the confidence intervals as error bars. This provides a visual representation of differences in diversity index between samples.

Alternatively you can specify boot = FALSE to use variance estimation for CI. This calls the H_CI() command internally.

Note: if you specify boot = FALSE and index = "invsimpson" an error is returned.

Value

The command produces a list object invisibly; this contains the diversity index, $H and confidence intervals, $CI. These are used to create a point plot with 95% error bars.

See Also

See the diversity() command in vegan for details of the diversity indices. The boot() command from package boot carries out the resampling; it is called from H_ci(), which calculates the confidence intervals. See the apply() command, which is used internally to obtain a statistics over the rows of a dataset. If boot = FALSE the CI are determined using H_CI() via a variance estimation process.

Examples

Make a plot and save results

> gbh = plot_H(gb.biol, ylab = "Shannon Entropy")

View the results

> gbh
$H
      E1       E2       E3       E4       E5       E6       G1
1.267096 1.424505 1.412179 1.737146 1.636923 1.557477 1.935163
      G2       G3       G4       G5       G6       W1       W2
2.000272 1.929426 2.360250 2.139567 2.205175 1.397251 1.582102
      W3       W4       W5       W6
1.539511 1.601073 1.616333 1.552470

$CI
            E1       E2       E3       E4       E5       E6
2.5%  1.167435 1.339836 1.320449 1.639057 1.552808 1.477362
97.5% 1.341344 1.487790 1.484144 1.802874 1.698681 1.614796
            G1       G2       G3       G4       G5       G6
2.5%  1.741574 1.805502 1.704016 2.192101 1.975294 2.028207
97.5% 2.047899 2.109588 2.046776 2.449227 2.230595 2.288512
            W1       W2       W3       W4       W5       W6
2.5%  1.337487 1.519955 1.470808 1.534223 1.560459 1.488700
97.5% 1.443202 1.635484 1.597767 1.654974 1.658593 1.601543

 Polar Ordination (aka Bray-Curtis ordination)

polar.ord

This command carries out Polar Ordination, a method of ordination (also known as Bray-Curtis ordination). The command works out polar ordination site scores and works out axes as follows: Axis 1 is the combination of sites that gives the maximum separation (not necessarily the maximum variance explained). Axis 2 is the combination of sites that maximises the orthogonality to the 1st axis.

There are print, plot and summary methods and an identify method for use with plots.

Usage

polar.ord(diss)

print.polar.ord(ord, digits = getOption("digits"))

summary.polar.ord(ord, k = 2, cor = TRUE, var.exp = TRUE, digits = getOption("digits"))

plot.polar.ord(ord, x.axis = 1, y.axis = 2, type = c("t", "p", "n"), ...)

identify.plot.polar.ord(pord, labels = pord$labels, ...)

Arguments

diss A dissimilarity matrix e.g. as produced by dist() or vegdist().
digits The number of digits to display, defaults to current settings.
ord The result of a polar.ord command.
k The number of axes to display.
cor If TRUE, shows the correlation between ordination axes.
var.exp If TRUE, shows the variance explained.
x.axis Which ordination axis to plot on the x-axis.
y.axis Which ordination axis to plot on the y-axis.
type The type of plot, "t" text, "p" points or "n" none.
pord The result of a plot.polar.ord command i.e. a polar ordination plot result.
labels A vector of labels to use for the sites.
... Other plotting parameters to pass to plot or identify.

Details

Polar ordination is a method of indirect gradient analysis. The starting point is a dissimilarity matrix, any method of producing a dissimilarity can be used as the starting point for the polar.ord() command. Typical commands would be dist() or vegdist() (from package: vegan). The analysis seeks to maximise the separation between sites along an axis. Subsequent axes are calculated to give the best separation but also to maximise the orthogonality, that is to minimise correlation between the axes.

The method is “simple” enough to be able to carry out using a spreadsheet but identification of subsequent axes can be a labour-intensive and iterative process. Note that in this application of Polar Ordination I have maximised separation, which was the original intention of Bray-Curtis. However, you can also go for a maximise variance explained approach as the summary method displays additional variance explained by each axis.

The print method allows you to view all the generated axes, their separation (the dissimilarity) and variance explained.

The summary method provides an overview that includes site socres, correlation and variance explained.

The plot method allows you to visualise the results from any combination of axes.

The identify method allows you to interactively label points on the plot.

Value

The polar.ord() command produces an object (essentially a list) holding the class "polar.ord", which has print, summary, plot and identify methods. The components are:

  • sites – the site scores for all the computed axes.
  • dist – the distance (separation) for each axis, in the same scale as the original dissimilarity.
  • cor – the correlation (Pearson) between the principal axis (the one with maximum separation) and each of the other axes.
  • exp – the variance explained by each axis.
  • add – the additional variance explained by each axis in addition to the original principal axis.
  • samples – a vector of site names.
  • axes – a vector of axis names.
  • diss – the original dissimilarity result (a "dist" object).
  • method – a character vector giving the dissimilarity metric employed.

These components are used by the print, summary, plot and identify methods.

See Also

Various methods of ordination from the vegan package e.g. metaMDS() for non-metric multidimensional scaling. Also see cmdscale(), which is part of the basic stats package.

Examples

The data (columns are sites) from CERE.RData

moss

Transpose sites/species and make a dissimilarity matrix

# Make dissimilarity matrix
d <- dist(t(moss), method = "euclidean")

Run the PO

> po <- polar.ord(d)

> po # Standard print
> print(po, digits = 4) # Use print method
> summary(po, k = 4) # Show summary of first 4 axes

Plot

> plot(po) # Basic plot
> plot(po, x.axis = 1, y.axis = 3) # Plot different axes

Species scores

require(vegan)
# Weighted average species scores
> wa <- wascores(scores(po), t(moss))

Quick plot

> plot(po)
> text(wa, labels = rownames(wa), col = "red", cex = 0.8)

ID plot

> op <- plot(po, type = "p") # Plot points only
> identify(op) # Click points to identify sites (Esc to cancel)
> points(wa, col = "red", pch = "+") # Add species points
> identify(wa, labels = rownames(wa)) # Add labels interactively

Q

Plot Tsallis entropy profile using q*.

qstar

This command uses the eventstar() command from vegan to calculate q*. The profile is then plotted as three plots showing, evenness, diversity and effective species for the Tsallis scales selected.

Usage

qstar(x,
      q = seq(0, 2, 0.05),
      qmax = 5,
      type = "l",
      lcol = "red",
      llty = 1,
      llwd = 1, ...)

Arguments

x a vector of community data.
q the scales to use for the Tsallis entropy (passed to tsallis).
qmax the maximum q value to use (passed to eventstar), default = 5.
type the type of plot, default = "l" for lines.
lcol
llty
llty
colour, type and width for the lines showing q* value and diversity values at q*.
... other graphical parameters to pass to plot().

Details

The function eventstar() finds a characteristic value of the scale parameter q of the Tsallis entropy corresponding to minimum of the evenness (equitability) profile based on Tsallis entropy. This is called q*. Use of q* and corresponding diversity, evenness, and Hill numbers (effective species), is recommended because it is a unique value representing the diversity profile. It is also positively associated with rare species in the community, thus it is a potentially useful indicator of certain relative abundance distributions of the communities.

The qstar() command uses` eventstar()` from vegan to calculate q* and then plots three graphs to show evenness, diversity and effective species at the chosed scales, and their relationship to q.

Value

The command returns the result from the eventstar() command, i.e. a data.frame with several columns showing q (qstar), the value of evenness at q* (Estar), the value of Tsallis entropy at q* (Hstar), and the effective species at q* (Dstar). See eventstar() for details.

See Also

See tsallis() in vegan for details of Tsallis entropy and the inspiration for this plotting routine. The eventstar() command from vegan is called to calculate q*.

Examples

Select a single sample from a community dataset

> qstar(DeVries[1,])
      qstar     Estar    Hstar    Dstar
1 0.6847494 0.1113006 5.062579 20.61567

R

 Compute AIC values from multiple RAD models.

rad_aic

This command takes the result of multiple RAD model-fitting (from radfit() in vegan) and computes the “best” AIC value for each sample. The result shows the best AIC value and the corresponding model name for each sample.

Usage

rad_aic(x)

Arguments

x the result of radfit() from vegan. The result must hold the class "radfit.frame", i.e. have been run on a community dataset with multiple samples.

Details

Rank – Abundance Dominance (RAD) or Dominance/Diversity plots display logarithmic species abundances against species rank order. These plots are supposed to be effective in analysing types of abundance distributions in communities. The radfit() command in vegan can compute various models for community datasets. The rad_aic() command takes the result of a radfit() command and, for each sample, shows the name of the model with the lowest AIC value as well as displaying the AIC.

Value

Returns a data frame with two columns giving the “best” RAD model for each sample in the original community dataset. The columns are: AIC: the AIC value, and Model: the name of the model. Each row corresponds to a sample in the original community dataset.

See Also

See radfit() in vegan for details of the model fitting. The rad_test() command carries out a significance test of differences between RAD models for a community sample.

Examples

Compute RAD models for a community dataset

> gb.rad <- radfit(gb.biol)

Explore the radfit() result

> rad_aic(gb.rad)
         AIC      Model
E1 106.29094 Mandelbrot
E2 134.58026 Preemption
E3 108.41663 Preemption
E4 160.24125 Mandelbrot
E5 183.46446 Preemption
E6 146.97826 Preemption
G1 106.00117       Zipf
G2  83.88072 Mandelbrot
G3  68.55087 Mandelbrot
G4 104.48570 Mandelbrot
G5 100.04912 Mandelbrot
G6  98.59426 Mandelbrot
W1 154.01455 Preemption
W2  81.28201 Preemption
W3  74.92607 Preemption
W4  98.54388 Preemption
W5 104.03335 Preemption
W6  82.63346 Preemption

A significance test of differences in deviance of RAD models.

rad_test

This command takes the result of a radfit() command (from vegan) and carries out a test of significance (ANOVA) of the various RAD models, examining their deviance. The result holds a class, "rad.htest" that has print, summary and plot methods.

Usage

rad_test(x, conf.level = 0.95, log = TRUE)

print(x, ...)

summary(x)

plot(x, which = "original", las = 1, ...)

Arguments

x the result of a radfit() command from vegan. For summary and plot methods x should be the result of the rad_test() command.
conf.level the confidence level, the default is 0.95.
log Logical. If TRUE (the default) the log of the model deviance is used as the response variable in ANOVA.
which the type of plot, "original" (default) shows boxplot of model deviance, "post.hoc" displays Tukey post-hoc result.
las the orientation of the axis annotations, the default produces labels all horizontal.
... other parameters to pass to plot() or print().

Details

Rank – Abundance Dominance (RAD) or Dominance/Diversity plots display logarithmic species abundances against species rank order. These plots are supposed to be effective in analysing types of abundance distributions in communities. The radfit() command in vegan can compute various models for community datasets. The rad_test() command takes the result of a radfit() command and carries out a test of model deviance using ANOVA. By default the log of model deviance is used, which improves the normality for the ANOVA.

For a meaningful result you should use replicated samples (see Examples).

Value

The rad_test() command returns invisibly a list with a custom class "rad.htest", which has print, summary and plot methods.

The print method returns a matrix showing Mean deviance, Standard Error of deviance and Mean AIC for the RAD models.

The summary method shows the ANOVA table ($ANOVA) and a summary of the Tukey Post-Hoc test ($Tukey).

The plot method produces a boxplot that shows model deviance for each RAD model. Models are re-ordered by decreasing mean deviance. Optionally the plot routine can display the post-hoc results.

See Also

See radfit() in vegan for details of the model fitting. The ANOVA is carried out using aov() and the post-hoc via the TukeyHSD() command. The boxplot() command is used to plot the results (see also reorder() for re-ordering of samples for plotting).

See rad_aic() for an alternative command that shows the “best” RAD model for each sample using results from radfit().

Examples

Make RAD models for six samples of dataset these are replicates (from same habitat type).

> gb.rad <- radfit(gb.biol[1:6,])

Carry out ANOVA

> gb.rt <- rad_test(gb.rad)

Print result

> print(gb.rt, digits = 3)
               Null Preemption Lognormal  Zipf Mandelbrot
Mean deviance   730       90.4    129.29 163.9      68.15
SE of deviance   66       14.7      8.43  14.0       9.48
Mean AIC        799      161.7    202.61 237.2     143.47

Summary of result shows ANOVA and Post-Hoc

> summary(gb.rt)
$ANOVA
            Df  Sum Sq Mean Sq F value    Pr(>F)
model        4 20.9193  5.2298  65.545 6.937e-13 ***
Residuals   25  1.9948  0.0798

Signif. codes:  0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1
$Tukey
             diff         lwr       upr        p adj
Pre-Mb  0.2700865 -0.20887308 0.7490461 4.776096e-01
Log-Mb  0.6773801  0.19842055 1.1563397 2.816974e-03
Zip-Mb  0.9053575  0.42639796 1.3843171 8.202652e-05
BS-Mb   2.3975406  1.91858106 2.8765002 8.252288e-13
Log-Pre 0.4072936 -0.07166594 0.8862532 1.233203e-01
Zip-Pre 0.6352710  0.15631147 1.1142306 5.339678e-03
BS-Pre  2.1274541  1.64849457 2.6064137 1.145184e-11
Zip-Log 0.2279774 -0.25098216 0.7069370 6.345799e-01
BS-Log  1.7201605  1.24120094 2.1991201 1.043389e-09
BS-Zip  1.4921831  1.01322353 1.9711427 1.790295e-08

$Logarithmic
[1] TRUE

$Confidence.Level
[1] 0.95

Print the results

# 2 columns in plot window
> opar <- par(mfrow = c(1,2))
# Box-plot
> plot(gb.rt, ylab = "Log of deviance", main = "RAD deviance")
# Tukey result
> plot(gb.rt, which = "post.hoc")
> abline(v = 0, lty = 3)
# reset plot window
> par(opar)

S

 Species richness for a community dataset.

spec.rich

This command computes species richness for a community dataset.

Usage

spec.rich(data)

Arguments

data a community dataset as a data.frame or matrix. Rows as samples.

Details

The number of species in a community sample (the species richness) is a basic measure of diversity.

Value

A numeric vector containing the species richness for each sample of the original dataset. The vector has a names attribute to match the sample names of the original dataset.

See Also

The specnumber() command in the vegan package can also compute richness (including groupwise since version 2.0-0). See also the which() command that is used to select elements > 0 in the calculations, and length().

The sr_est() and specaccum() commands for modelling species richness with accumulating sites.

Examples

Species richness using a 2-row matrix

> spec.rich(DeVries)
canopy  under
    56     65

Richness from a 16-row data frame

> spec.rich(gb.biol)
E1 E2 E3 E4 E5 E6 G1 G2 G3 G4 G5 G6 W1 W2 W3 W4 W5 W6
17 14 15 25 21 17 28 22 18 28 26 24 12 11 11 12 12 10

Pairwise chi squared species association.

species_assoc

This command takes two species that you specify from a community dataset and carries out a pairwise chi squared association test.

Usage

species_assoc(spA, spB, data)

Arguments

spA
spB
Index values for the two species to be analysed, corresponding to their position in the rownames attributes, i.e. a numerical value corresponding to the row number.
data the data.frame (or matrix) containing the species for analysis; rows as species.

Details

A chi squared approach can be used to explore species co-occurrence. Species that are positively associated will tend to be from the same community, whereas species that are negatively associated will be from different communities. The species_assoc() command takes a dataset and computes the chi squared statistics of species co-occurrence for two species that you specify.

Value

A list object (returned invisibly) with four components:

spA The name of the first species (the name will be taken from the rownames attribute of the data).
spB The name of the second species.
chi.sq The chi squared test result – includes the observed, expected and Pearson residuals for the pair of selected species. The “standard” chi squared result is returned to screen.
co.occur The co-occurrence matrix.

See Also

See the chisq.test() for general details of chi squared testing. The designdist() command from vegan is used to compute the co-occurrence matrix. See also the species_ind() command for using chi squared in indicator species analysis.

Examples

Look at the species available

> rownames(hsereT)
[1] "Carex rostrata"           "Fillipendula ulmaria"
[3] "Caltha palustris"         "Bryophyte"
[5] "Galium palustre"          "Valeriana officianalis"
[7] "Ranunculus repens"        "Cochlearia spp"
[9] "Epilobium palustre"       "Myosotis scorpioides"
[11] "Mentha rostrat"          "Rumex acetosa"
[13] "Succisa pratensis"        "Molinea caerulea"
[15] "Juncus rostrat"           "Eriophorum vaginatum"
[17] "Eriophorum angustifloium" "Sphagnum spp."
[19] "Calluna vulgaris"         "Vaccinium myrtillus"
[21] "Polytrichum spp."         "Vaccinium vitis-idea"

Carry out pairwise analysis for species in rows 1,2

> hsere.co <- species_assoc(1, 2, data = hsereT)

Pearson's Chi-squared test with Yates' continuity
correction
data:  mat
X-squared = 15.1532, df = 1, p-value = 9.913e-05
Warning message:
In chisq.test(mat) : Chi-squared approximation may be incorrect

View the co-occurrence matrix

hsere.co$co.occur
              Fillipendula ulmaria
Carex rostrata  +  -
             + 11 11
             -  0 28

Look at the Pearson residuals

> hsere.co$chi.sq$residuals
              Fillipendula ulmaria
Carex rostrata         +         -
             +  2.800000 -1.487038
             - -2.481935  1.318118

Indicator species analysis by chi squared.

species_ind

This command takes a community dataset and carries out a chi squared test for indicator species. The species can be arranged as the rows or columns. The samples are used as the groupings for the analysis, so you may need to group the dataset using rowsum() before analysis. The result shows only those species that have a significant association (positive or negative) for all samples/groups.

Usage

species_ind(x, rows = c("species", "samples"))

Arguments

x A community dataset, usually a data.frame or matrix.
rows Should the rows be treated as “species” (the default) or “samples”?

Details

An indicator species is one that is positively associated with a particular site or habitat and negatively associated with other sites/habitats. Ideally an indicator species will also be found only rarely in other sites/habitats. The chi squared approach is one way to explore species associations with sites/habitats.

Value

A list with two components (the indicators component is returned directly and the entire list returned invisibly):

chi.test The results of the chi squared test, which includes the observed and expected values as well as the Pearson residuals.
indicators A matrix with the names of the indicator species and the grouping; the data are the Pearson residuals. Only species with all Pearson residuals >2 are shown (i.e. only significant ones).

&nbsp;

See Also

See chisq.test() for details of the chi squared process. See also species_assoc() for pairwise species association. The indval() command from labdsv is probably a more robust alternative, which uses a quite different approach.

Examples

Make a dataset by grouping community samples (by habitat)

> gb <- rowsum(gb.biol, group = gb.site$Habitat)

Indicator species based on habitat

> gb.ind <- species_ind(gb, rows = "samples")
        Aba.par   Ago.afr   Bem.big   Bem.man   Cal.fus   Cli.fos
Edge   4.092740 -2.841032 -3.192145 -4.170376 -2.841032 -2.979700
Grass -2.034203  9.506136 11.182116 18.561500  9.506136  9.970119
Wood  -2.706151 -2.963855 -3.629967 -7.107075 -2.963855 -3.108518
        Lei.ruf   Neb.bre   Poe.cup
Edge  -7.245067 -3.209243 -2.682207
Grass -5.450846 -6.097483 17.636905
Wood  10.205860  6.724140 -7.980425

Plot Pearson residuals for Edge habitat

> barplot(gb.ind$ind[1,])
> title(main = "Habitat: Edge", ylab = "Pearson Residual")
> abline(h = 0)
> abline(h = c(2, -2), lty = 3)

Species richness estimation with accumulating sites.

sr_est

This command uses accumulating species richness and a log model to determine the estimated species richness for a complete dataset. You can use a community dataset or the result of the specaccum() command from vegan. The result contains the estimated species richness and the log model. The result holds a class "srest", which has print, plot and summary methods.

Usage

sr_est(accum)

print(srest, ...)

summary(srest, digits = getOption("digits"))

plot(srest, col.points = "black", col.line = "black",

lwd.line = 1, lwd.points = 1, ...)

Arguments

accum A vector of values corresponding to accumulating species richness or, a community dataset (rows as samples) or, the result of a specaccum() command from vegan.
srest The result of an sr_est() command.
digits The number of significant figures to display.
col.points
lwd.points
The colour and width of the points used in the base plot.
col.line
lwd.line
The colour and width used for the best-fit line (produced via abline).
... Additional parameters to pass to plot() or print().

Details

Species accumulation curves (SAC) are used to compare diversity properties of community data sets using different accumulator functions. The sr_est() command uses accumulating species richness as a starting point – the data can be a vector of accumulating values or the result of a vegan::specaccum() command. Alternatively you can use a community dataset as a data.frame or matrix. A log-linear model is then computed to show the estimated species richness from the accumulated sites.

Value

A list with custom class "srest", which has print, summary and plot methods. The components are:

data.model The log model data; a data frame with cumulative species, number of new species and log of cumulative species corresponding to the samples in the original dataset.
coefficients The intercept and slope from the log model.
species.estimate The estimated species richness based on the log model.

See Also

See the specaccum() command in vegan for details of methods of estimating species richness using various collector methods, and specpool() for estimated richness in a species pool. See spec.rich() and specnumber() for simple calculation of species richness. The log model in sr_est() is computed using the lm() command.

Examples

Load vegan and dune data

> require(vegan)
> data(dune)

Estimated richness

> specpool(dune)
    Species  chao  chao.se jack1 jack1.se    jack2     boot
All      30 32.25 3.395769 32.85 1.645448 33.84474 31.54040
     boot.se  n
All 1.153121 20

Use log model for estimate

> dune.sr <- sr_est(dune)
> dune.sr
Estimated Species Richness
                  29.42548

Visualise the model

> plot(dune.sr, lty = 2)

Use species accumulation curve for first 6 rows (all same habitat).

> gb.sa <- specaccum(gb.biol[1:6,])

Summary of log model

> summary(sr_est(gb.sa))
Model Data:
   cum.spp   new.spp   log.cs
1 18.16667 18.166667 1.259275
2 21.26667  3.100000 1.327699
3 23.30000  2.033333 1.367356
4 24.80000  1.500000 1.394452
5 26.00000  1.200000 1.414973
6 27.00000  1.000000 1.431364
Model coefficients:
Intercept     Slope
 131.0205  -92.6311
Estimated Species Richness
                  25.96767

My Publications

I have written several books on ecology and data analysis

An Introduction to R
Data Analysis and Visualisation
£35.00
Beginning R: The Statistical
Programming Language
£26.99
Statistics for Ecologists
Using R and Excel
£34.99
The Essential R
Reference
£44.99
Community
Ecology
£39.99
Managing Data
Using Excel
£24.99

Register your interest for our Training Courses

We run training courses in data management, visualisation and analysis using Excel and R: The Statistical Programming Environment. Courses will be held at one of our training centres in London. Alternatively we can come to you and provide the training at your workplace. Training Courses are also available via an online platform.




    Get In Touch Now

    for any information regarding our training courses, publications or help with a data project