MAT {rioja} | R Documentation |
Functions for reconstructing (predicting) environmental values from biological assemblages using the Modern Analogue Technique (MAT), also know as k nearest neighbours (k-NN).
MAT(y, x, dist.method="sq.chord", k=5, lean=TRUE) ## S3 method for class 'MAT': predict(object, newdata=NULL, k=5, sse=FALSE, nboot=100, match.data=TRUE, verbose=TRUE, lean=TRUE, ...) ## S3 method for class 'MAT': performance(object, ...) ## S3 method for class 'MAT': crossval(object, cv.method="lgo", verbose=TRUE, ngroups=10, nboot=100, ...) ## S3 method for class 'MAT': print(x, ...) ## S3 method for class 'MAT': summary(object, full=FALSE, ...) ## S3 method for class 'MAT': plot(x, resid=FALSE, xval=FALSE, k=5, wMean=FALSE, xlab="", ylab="", ylim=NULL, xlim=NULL, add.ref=TRUE, add.smooth=FALSE, ...) ## S3 method for class 'MAT': residuals(object, ...) ## S3 method for class 'MAT': fitted(object, ...) ## S3 method for class 'MAT': screeplot(x, ...) paldist(y, dist.method="sq.chord") paldist2(y1, y2, dist.method="sq.chord")
y, y1, y2 |
data frame containing biological data. |
newdata |
data frame containing biological data to predict from. |
x |
a vector of environmental values to be modelled, matched to y. |
dist.method |
dissimilarity coefficient. See details for options. |
match.data |
logical indicate the function will match two species datasets by their column names. You should only set this to FALSE if you are sure the column names match exactly. |
k |
number of analogues to use. |
lean |
logical to remove items form the output. |
object |
an object of class MAT . |
resid |
logical to plot residuals instead of fitted values. |
xval |
logical to plot cross-validation estimates. |
wMean |
logical to plot weighted-mean estimates. |
xlab, ylab, xlim, ylim |
additional graphical arguments to plot.wa . |
add.ref |
add 1:1 line on plot. |
add.smooth |
add loess smooth to plot. |
cv.method |
cross-validation method, either "lgo" or "bootstrap". |
verbose |
logical or integer to show feedback during cross-validaton. If TRUE print feedback every 50 cycles, if integer, use this value. |
nboot |
number of bootstrap samples. |
ngroups |
number of groups in leave-group-out cross-validation, or a vector contain leave-out group menbership. |
sse |
logical indicating that sample specific errors should be calculated. |
full |
logical to indicate a full or abbreviated summary. |
... |
additional arguments. |
MAT
performs an environmental reconstruction using the modern analogue technique. Function MAT
takes a training dataset of biological data (species abundances) y
and a single associated environmental variable x
, and generates a model of closest analogues, or matches, for the modern data data using one of a number of dissimilarity coefficients. Options for the latter are: "euclidean", "sq.euclidean", "chord", "sq.chord", "chord.t", "sq.chord.t", "chi.squared", "sq.chi.squared", "bray". "chord.t" are true chord distances, "chord" refers to the the variant of chord distance using in palaeoecology (e.g. Overpeck et al. 1985), which is actually Hellinger's distance (Legendre & Gallagher 2001). There are various help functions to plot and extract information from the results of a MAT
transfer function. The function predict
takes MAT
object and uses it to predict environmental values for a new set of species data, or returns the fitted (predicted) values from the original modern dataset if newdata
is NULL
. Variables are matched between training and newdata by column name (if match.data
is TRUE
). Use compare.datasets
to assess conformity of two species datasets and identify possible no-analogue samples.
MAT
has methods fitted
and rediduals
that return the fitted values (estimates) and residuals for the training set, performance
, which returns summary performance statistics (see below), and print
and summary
to summarise the output. MAT
also has a plot
method that produces scatter plots of predicted vs observed measurements for the training set.
Function screeplot
displays the RMSE of prediction for the training set as a function of the number of analogues (k) and is useful for estimating the optimal value of k for use in prediction.
paldist
and paldist1
are helper functions though they may be called directly. paldist
takes a single data frame or matrix returns a distance matrix of the row-wise dissimilarities. paldist2
takes two data frames of matrices and returns a matrix of all row-wise dissimilarities between the two datasets.
Function MAT
returns an object of class MAT
which contains the following items:
call |
original function call to MAT . |
fitted.vales |
fitted (predicted) values for the training set, as the mean and weighted mean (weighed by dissimilarity) of the k closest analogues. |
diagnostics |
standard deviation of the k analogues and dissimilarity of the closest analogue. |
dist.n |
dissimilarities of the k closest analogues. |
x.n |
environmental values of the k closest analogues. |
match.name |
column names of the k closest analogues. |
x |
environmental variable used in the model. |
dist.method |
dissimilarity coefficient. |
k |
number of closest analogues to use. |
y |
original species data. |
cv.summary |
summary of the cross-validation (not yet implemented). |
dist |
dissimilarity matrix (returned if lean=FALSE ). |
predicted |
predictions for newdata . |
diagnostics |
standard deviations of the k closest analogues and distance of closest analogue. |
dist.n |
dissimilarities of the k closest analogues. |
x.n |
environmental values of the k closest analogues. |
match.name |
column names of the k closest analogues. |
dist |
dissimilarity matrix (returned if lean=FALSE ). |
Functions paldist
and paldist2
return dissimilarity matrices. performance
returns a matrix of performance statistics for the MAT model, with columns for RMSE, R2, mean and max bias.
Function performance
returns a data frame with performce statistics for each number of analogues up to k. See performance
for a description of the output.
Steve Juggins
Legendre, P. & Gallagher, E. (2001) Ecologically meaningful transformations for ordination of species. Oecologia, 129, 271-280.
Overpeck, J.T., Webb, T., III, & Prentice, I.C. (1985) Quantitative interpretation of fossil pollen spectra: dissimilarity coefficients and the method of modern analogs. Quaternary Research, 23, 87-108.
WAPLS
, WA
, performance
, and compare.datasets
for diagnostics.
# pH reconstruction of the RLGH, Scotland, using SWAP training set # shows recent acidification history data(SWAP) data(RLGH) fit <- MAT(SWAP$spec, SWAP$pH, k=20) # generate results for k 1-20 #examine performance performance(fit) print(fit) # How many analogues? screeplot(fit) # do the reconstruction pred.mat <- predict(fit, RLGH$spec, k=10) # plot the reconstruction plot(RLGH$depths$Age, pred.mat$fit[, 1], type="b", ylab="pH", xlab="Age") #compare to a weighted average model fit <- WA(SWAP$spec, SWAP$pH) pred.wa <- predict(fit, RLGH$spec) points(RLGH$depths$Age, pred.wa$fit[, 1], col="red", type="b") legend("topleft", c("MAT", "WA"), lty=1, col=c("black", "red"))