compare.datasets {rioja}R Documentation

Compare datasets for matching variables (species)

Description

Compare two datasets and summarise species occurrance and abundance of species recorded in dataset one across dataset two. Useful for examining the conformity between sediment core and training set species data.

Usage

compare.datasets(y1, y2, n.cut=c(5, 10, 20, 50), 
      max.cut=c(2, 5, 10, 20, 50))

## S3 method for class 'compare.datasets':
plot(x, y, subset=1:nrow(x$obs), ...) 

Arguments

y1, y2 two data frames or matrices, usually of biological species abundance data, to compare.
x an object of class compare.datasets produce by function compare.datasets.
y original dataset (ie. y1 above) used in comparison.
n.cut vector of abundances to be used for species occurrence calculations (see details).
max.cut vector of occurences to be used for species maximum abundance calculations (see details).
subset a vector giving row indices to plot. Used to limit the number of plots with larges datsets.
... additional arguments to xyplot.

Details

Function compare.datasets compares two datasets. It summarise the species profile (number of occurences etc.) and sample profile (number of species in each sample etc.) of dataset 1. For those species recorded in dataset 1 it also provides summaries of their occurence and abundance in dataset 2. It is useful diagnostic for checking the conformity between core and training set data, specifically for identifying core taxa absent from the training set, and core samples with portions of their assemblage missing from the training set. Function write.list.Excel saves the output of compare.datasets in Excel format for more convenient browsing.

plot.compare.datasets provides a simple visualisation of the comparisons. It produces a matrix of plots, one for each sample in dataset 1, showing the abundance of each taxon in dataset 1 (x-axis) against the N2 value of that taxon in dataset 2 (y-axis, with symbols scaled according to abundance in dataset 2. The plots shouls aid identification of samples with high abundance of taxa that are rare (low N2) or have low abundance in the training set. Taxa thar are absent from the training set are indicated with a red "+".

Value

Function compare.datasets returns a list with two names elements:

vars data frame listing for each variable in the first dataset: N.occur = number of occurences in dataset 1, N2, Hill's N2 for species in dataset 1, Max = maximum value in dataset 1, N.2 = number of occurences in dataset 2, N2.2 = Hill's N2 for species in dataset 2, Max.2 = maximum value in dataset 2, N.005, number of occurences where the species is greater than 5 etc.
objs data frame listing for each observation in the first dataset: N.taxa = number of species greater than zero abundance, N2, Hill's N2 for samples, Max = maximum value, total = sample total, M.002 = number of taxa with a maximum abundance greater than 2 2 etc., N2.005 = number of taxa in dataset 1 with more than 5 occurences in 2 dataset 2 etc., Sum.N2.005 = sample total including only those taxa with at least 5 occurrences in dataset 2 etc., M2.005 = number of taxa in dataset 1 with maximum abundance greater than 2 in dataset 2 etc., and Sum. M2.005 = sample total including only those taxa with a maximum abundance greater than 2 in dataset 2 etc.


Function plot.compare.datasets returns an object of class trellis which may be plotted.

Author(s)

Steve Juggins

See Also

write.list.Excel to save the output of compare.datasets in Excel format.

Examples

# compare diatom data from core from Round Loch of Glenhead
# with SWAP surface sample dataset
data(RLGH)
data(SWAP)
result <- compare.datasets(RLGH$spec, SWAP$spec)
result

## Not run: 
#save comparison to Excel for more convenient browsing
write.list.Excel(result, "Comparison.xls")

#visualise the comparison
plot.compare.datasets(result)
## End(Not run)

[Package rioja version 0.5-6 Index]