yai {yaImpute} | R Documentation |
Given a set of observations, yai
1) separates the observations
into reference and target observations, 2) applies the
specified method to project X-variables into a Euclidean space (not
always, see argument method
), and 3) finds the k-nearest
neighbors within the referenece observations and between the reference
and target observations. An alternative method using randomForest
classification and regression trees is provided for steps 2 and 3.
yai(x=NULL,y=NULL,data=NULL,k=1,noTrgs=FALSE,noRefs=FALSE, nVec=NULL,pVal=.05,method="msn",mtry=NULL,ntree=500,ann=TRUE)
x |
1) a matrix or data frame containing the X-variables for all
observations. Row names are the identification for the observation, or 2) a
one-sided formula defining the X-variables as a linear formula. If
a formula is coded for x , one must be used for y as well, if
needed. |
y |
1) a matrix or data frame containing the Y-variables for the reference observations, or 2) a one-sided formula defining the Y-variables as a linear formula. |
data |
when x and y are formulas, then data is a data frame or
matrix that contains all the variables. The observations are split by yai
into two sets. Reference observations are those with no missing values for X-
and Y-variables. Target observations are those with values for X-variables
and NAs for Y-variables. |
k |
the number of nearest neighbors; default is 1. |
noTrgs |
when TRUE, skip finding neighbors for target observations. |
noRefs |
when TRUE, skip finding neighbors for reference observations. |
nVec |
number of canonical vectors to use (methods msn and msn2 ),
or number of independent of X-variables reference data when method
mahalanobis . When NULL, the number is set by the function. |
pVal |
significant level for canonical vectors, used when method is
msn or msn2 . |
method |
is the strategy finding neighbors; the
options are the quoted key words (see details):
euclidean - distance is computed in a normalized X space.
raw - like euclidean, except no normalization is done.
mahalanobis - distance is computed in its namesakes space.
ica - like mahalanobis, but based on Independent Component Analysis using
package fastICA .
msn - distance is computed in a projected canonical space.
msn2 - like msn, but with variance weighting (canonical regression
rather than correlation).
gnn - distance is computed using a projected ordination of
Xs found using canonical correspondence analysis
(cca from package vegan).
randomForest - distance is one minus the
proportion of randomForest trees where a target observation is in
the same terminal node as a reference observation (see randomForest ).
|
mtry |
the number of X-variables picked at random, see randomForest documentation, default is sqrt(number of X-variables). |
ntree |
the number of classification and regression trees in the randomForest. When more than one Y-variable is used, the trees are divided among the variables. Alternatively, ntree can be a vector of values corresponding to each Y-variable. |
ann |
TRUE if ann is used to find neighbors, FALSE if a slow search is used. |
See ./../doc/yaImputePaper.pdf or this alternate http://forest.moscowfsl.wsu.edu/gems/yaImputePaper.pdf
An object of class yai
, which is a list with
the following tags:
call |
the call. |
yRefs, xRefs |
matrices of the X- and Y-variables for just the reference observations (unscaled). The scale factors are attached as attributes. |
obsDropped |
a list of the row names for observations dropped for various reasons (missing data). |
trgRows |
a list of the row names for target observations as a subset of all observations. |
xall |
the X-variables for all observations. |
cancor |
returned from cancor function when method msn or
msn2 is used (NULL otherwise). |
ccaVegan |
an object of class cca (from package vegan) when method gnn is used. |
ftest |
a list containing partial F statistics and a vector of Pr>F (pgf) corresponding to the canonical correlation coefficients when method msn or msn2 is used (NULL otherwise). |
yScale, xScale |
scale data used on yRefs and xRefs as needed. |
k |
the value of k. |
pVal |
as input; only used when method msn or msn2 is used. |
projector |
NULL when not used. For methods msn, msn2, gnn and mahalanobis, this is a matrix that projects normalized X-variables into a space suitable for doing Eculidian distances. |
nVec |
number of canonical vectors used (methods msn and msn2 ),
or number of independent X-variables in the reference data when method
mahalanobis is used. |
method |
as input, the method used. |
ranForest |
a list of the forests if method randomForest is used. There is
one forest for each Y-variable, or just one forest when there are no
Y-variables. |
ICA |
a list of information from fastICA
when method ica is used. |
ann |
the value of ann, TRUE when ann is used, FALSE otherwise. |
xlevels |
NULL if no factors are used as predictors; otherwise a list
of predictors that have factors and their levels (see lm ). |
neiDstTrgs |
a data frame of distances between a target (identified by its row name) and the k references. There are k columns. |
neiIdsTrgs |
a data frame of reference identifications that correspond to neiDstTrgs. |
neiDstRefs, neiIdsRefs |
counterparts for references. |
Nicholas L. Crookston ncrookston@fs.fed.us
Andrew O. Finley afinley@stat.umn.edu
require (yaImpute) # running these examples will load packages vegan and randomForest data(MoscowMtStJoe) # convert polar slope and aspect measurements to cartesian # (which is the same as Stage's (1976) expression). polar <- MoscowMtStJoe[,40:41] polar[,1] <- polar[,1]*.01 # slope proportion polar[,2] <- polar[,2]*(pi/180) # aspect radians cartesian <- t(apply(polar,1,function (x) {return (c(x[1]*cos(x[2]),x[1]*sin(x[2]))) })) colnames(cartesian) <- c("xSlAsp","ySlAsp") x <- cbind(MoscowMtStJoe[,37:39],cartesian,MoscowMtStJoe[,42:64]) y <- MoscowMtStJoe[,1:35] mal <- yai(x=x, y=y, method="mahalanobis", k=1) gnn <- yai(x=x, y=y, method="gnn", k=1) msn <- yai(x=x, y=y, method="msn", k=1) plot(mal) # reduce the plant community data for randomForest. yba <- MoscowMtStJoe[,1:17] ybaB <- whatsMax(yba,nbig=7) # see help on whatsMax rf <- yai(x=x, y=ybaB, method="randomForest", k=1) # build the imputations for the original y's rforig <- impute(rf,ancillaryData=y) # compare the results compare.yai(mal,gnn,msn,rforig) plot(compare.yai(mal,gnn,msn,rforig))