IKFA {rioja}R Documentation

Imbrie & Kipp Factor Analysis

Description

Functions for reconstructing (predicting) environmental values from biological assemblages using Imbrie & Kipp Factor Analysis (IKFA), as used in palaeoceanography.

Usage

IKFA(y, x, nFact = 5, IsPoly = FALSE, IsRot = TRUE, 
      ccoef = 1:nFact, check.data=TRUE, lean=FALSE, ...)

IKFA.fit(y, x, nFact = 5, IsPoly = FALSE, IsRot = TRUE, 
      ccoef = 1:nFact, lean=FALSE)

## S3 method for class 'IKFA':
predict (object, newdata=NULL, sse=FALSE, nboot=100,
      match.data=TRUE, verbose=TRUE, ...)

communality <- function(object, y)

## S3 method for class 'IKFA':
crossval(object, cv.method="loo", verbose=TRUE, ngroups=10,
      nboot=100, ...)

## S3 method for class 'IKFA':
performance(object, ...)

## S3 method for class 'IKFA':
rand.t.test(object, n.perm=999, ...)

## S3 method for class 'IKFA':
screeplot(x, rand.test=TRUE, ...)

## S3 method for class 'IKFA':
print(x, ...)

## S3 method for class 'IKFA':
summary(object, full=FALSE, ...)

## S3 method for class 'IKFA':
plot(x, resid=FALSE, xval=FALSE, nFact=max(x$ccoef), 
      xlab="", ylab="", ylim=NULL, xlim=NULL, add.ref=TRUE,
      add.smooth=FALSE, ...)

## S3 method for class 'IKFA':
residuals(object, ...)

## S3 method for class 'IKFA':
coef(object, ...)

## S3 method for class 'IKFA':
fitted(object, ...)

Arguments

y a data frame or matrix of biological abundance data.
x, object a vector of environmental values to be modelled or an object of class wa.
newdata new biological data to be predicted.
nFact number of factor to extract.
IsRot logical to rotate factors.
ccoef vector of factor numbers to include in the predictions.
IsPoly logical to include quadratic of the factors as predictors in the regression.
check.data logical to perform simple checks on the input data.
match.data logical indicate the function will match two species datasets by their column names. You should only set this to FALSE if you are sure the column names match exactly.
lean logical to exclude some output from the resulting models (used when cross-validating to speed calculations).
full logical to show head and tail of output in summaries.
resid logical to plot residuals instead of fitted values.
xval logical to plot cross-validation estimates.
xlab, ylab, xlim, ylim additional graphical arguments to plot.wa.
add.ref add 1:1 line on plot.
add.smooth add loess smooth to plot.
cv.method cross-validation method, either "loo", "lgo" or "bootstrap".
verbose logical or integer to show feedback during cross-validaton. If TRUE print feedback every 50 cycles, if integer, use this value.
nboot number of bootstrap samples.
ngroups number of groups in leave-group-out cross-validation, or a vector contain leave-out group menbership.
sse logical indicating that sample specific errors should be calculated.
rand.test logical to perform a randomisation t-test to test significance of cross validated factors.
n.perm number of permutations for randomisation t-test.
... additional arguments.

Details

Function IKFA performs Imbrie and Kipp Factor Analysis, a form of Principal Components Regrssion (Imbrie & Kipp 1971).

Function predict predicts values of the environemntal variable for newdata or returns the fitted (predicted) values from the original modern dataset if newdata is NULL. Variables are matched between training and newdata by column name (if match.data is TRUE). Use compare.datasets to assess conformity of two species datasets and identify possible no-analogue samples.

IKFA has methods fitted and rediduals that return the fitted values (estimates) and residuals for the training set, performance, which returns summary performance statistics (see below), coef which returns the species coefficients, and print and summary to summarise the output. IKFA also has a plot method that produces scatter plots of predicted vs observed measurements for the training set.

Function rand.t.test performs a randomisation t-test to test the significance of the cross-validated components after van der Voet (1994).

Function screeplot displays the RMSE of prediction for the training set as a function of the number of factors and is useful for estimating the optimal number for use in prediction. By default screeplot will also carry out a randomisation t-test and add a line to scree plot indicating percentage change in RMSE with each component annotate with the p-value from the randomisation test.

Value

Function IKFA returns an object of class IKFA with the following named elements:

coefficients species coefficients (the updated "optima").
meanY weighted mean of the environmental variable.
iswapls logical indicating whether analysis was IKFA (TRUE) or PLS (FALSE).
T sample scores.
P variable (species) scores.
npls number of pls components extracted.
fitted.values fitted values for the training set.
call original function call.
x environmental variable used in the model.
standx, meanT sdx additional information returned for a PLSif model.
predicted predicted values of each training set sample under cross-validation.
residuals.cv prediction residuals.
fit predicted values for newdata.
fit.boot mean of the bootstrap estimates of newdata.
v1 squared standard error of the bootstrap estimates for each new sample.
v2 mean squared error for the training set samples, across all bootstrap samples.
SEP standard error of prediction, calculated as the square root of v1 + v2.


Function performance returns a matrix of performance statistics for the IKFA model. See performance, for a description of the summary.
Function rand.t.test returns a matrix of performance statistics together with columns indicating the p-value and percentage change in RMSE with each higher component (see van der Veot (1994) for details).

Author(s)

Steve Juggins

References

Imbrie, J. & Kipp, N.G. (1971). A new micropaleontological method for quantitative paleoclimatology: application to a Late Pleistocene Caribbean core. In The Late Cenozoic Glacial Ages (ed K.K. Turekian), pp. 77-181. Yale University Press, New Haven.

van der Voet, H. (1994) Comparing the predictive accuracy of models uing a simple randomization test. Chemometrics and Intelligent Laboratory Systems, 25, 313-323.

See Also

WA, MAT, performance, and compare.datasets for diagnostics.

Examples

data(IK)
spec <- IK$spec
SumSST <- IK$env$SumSST
core <- IK$core

fit <- IKFA(spec, SumSST)
fit
# cross-validate model
fit.cv <- crossval(fit, cv.method="lgo")
# How many components to use?
screeplot(fit.cv)

#predict the core
pred <- predict(fit, core, npls=2)

#plot predictions - depths are in rownames
depth <- as.numeric(rownames(core))
plot(depth, pred$fit[, 2], type="b")

# fit using only factors 1, 2, 4, & 5
# and using polynomial terms
# as Imbrie & Kipp (1971)
fit2 <- IKFA(spec, SumSST, ccoef=c(1, 2, 4, 5), IsPoly=TRUE)
fit2.cv <- crossval(fit2, cv.method="lgo")
screeplot(fit2.cv)

## Not run: 
# predictions with sample specific errors
# takes approximately 1 minute to run
pred <- predict(fit, core, sse=TRUE, nboot=1000)
pred
## End(Not run)

[Package rioja version 0.5-6 Index]