knncatimpute {scrime} | R Documentation |
Imputes missing values in a matrix composed of categorical variables using k Nearest Neighbors.
knncatimpute(x, dist = NULL, nn = 3, weights = TRUE)
x |
a numeric matrix containing missing values. All non-missing values
must be integers between 1 and n.cat, where n.cat
is the maximum number of levels the categorical variables in x can take.
If the k nearest observations should be used to replace the missing values
of an observation, then each row must represent one of the observations and each
column one of the variables. If the k nearest variables should be used
to impute the missing values of a variable, then each row must correspond to a variable
and each column to an observation. |
dist |
either a character string naming the distance measure or a distance matrix.
If the former, dist must be either "smc" , "cohen" , or "pcc" .
If the latter, dist must be a symmetric matrix having the same number of rows
as x . In this case, both the upper and the lower triangle of dist must
contain the distances, and the row and column names of dist must be equal to
the row names of x . If NULL , dist = "smc" is used. |
nn |
an integer specifying k, i.e. the number of nearest neighbors, used in the imputation of the missing values. |
weights |
should weighted kNN be used to impute the missing values? If TRUE ,
the vote of each nearest neighbor is weighted by the reciprocal of its distance to the observation or variable
when the missing values of this observation or variable, respectively, are replaced. |
A matrix of the same size as x
in which all the missing values have been imputed.
Holger Schwender, holger.schwender@udo.edu
Schwender, H. (2007). Statistical Analysis of Genotype and Gene Expression Data. Dissertation, Department of Statistics, University of Dortmund.
knncatimputeLarge
, gknn
, smc
, pcc
## Not run: # Generate a data set consisting of 200 rows and 50 columns # in which the values are integers between 1 and 3. # Afterwards, remove 20 of the values randomly. mat <- matrix(sample(3, 10000, TRUE), 200) mat[sample(10000, 20)] <- NA # Replace the missing values. mat2 <- knncatimpute(mat) # Replace the missing values using the 5 nearest neighbors # and Cohen's Kappa. mat3 <- knncatimpute(mat, nn = 5, dist = "cohen") ## End(Not run)