rrp.impute {rrp} | R Documentation |
This function performs a simple nearest neighbor hot-deck imputation method using the RRP dissimilarity matrix.
rrp.impute(data, D = NULL, k = 1, msplit = 10, Rep = 250, cut.in = 15)
data |
a data.frame containing missing data on some covariates |
D |
NULL or a dist object with attribute method = `RRP' |
k |
number of nearest neighbors to use |
msplit |
minimum split parameter in the rpart algorithm |
Rep |
number of RRP replications |
cut.in |
number of breaks used to cut continuous covariates |
If missing data are on a continuous covariate, the missing value is imputed as the average of the covariate values of the nearest neighbors, otherwise the majority of the `votes' determines the class of the missing observation on the basis of nearest available data.
If D
is NULL
a RRP-dissimilarity matrix is created.
A list
new.data |
a copy of the data data with missing data imputed |
dist |
an object of class dist which is a copy of the RRP
dissimilarity matrix generated or passed in input and used to search
for nearest neighbors |
S.M. Iacus
Iacus, S.M., Porro, G. (2006) Random Recursive Partitioning and its applications to missing data imputation, classification and average treatment effect estimation, submitted.
data(iris) X <- iris n <- dim(X)[1] set.seed(123) miss <- sample(1:n, 10) for(i in miss) X[i, sample(1:5, 2)] <- NA X[miss,] ## unsupervised x <- rrp.impute(X) x$new.data[miss,] iris[miss,]