rrp.impute {rrp}R Documentation

Nearest neighbor hot-deck imputation using RRP dissimilarity matrix

Description

This function performs a simple nearest neighbor hot-deck imputation method using the RRP dissimilarity matrix.

Usage

rrp.impute(data, D = NULL, k = 1, msplit = 10, Rep = 250, cut.in = 15)

Arguments

data a data.frame containing missing data on some covariates
D NULL or an object of class XPtr
k number of nearest neighbors to use
msplit minimum split parameter in the rpart algorithm
Rep number of RRP replications
cut.in number of breaks used to cut continuous covariates

Details

If missing data are on a continuous covariate, the missing value is imputed as the average of the covariate values of the nearest neighbors, otherwise the majority of the `votes' determines the class of the missing observation on the basis of nearest available data.

If D is NULL a RRP-dissimilarity matrix is created.

From version 1.6 of the package the RRP matrix is stored as an external pointer to avoid duplications. This allow to work on bigger datasets. Hence this function no longer accepts dist objects.

Value

A list

new.data a copy of the data data with missing data imputed
dist an object of class XPtr used to search for nearest neighbors

Author(s)

S.M. Iacus

References

Iacus, S.M., Porro, G. (2009) Random Recursive Partitioning: a matching method for the estimation of the average treatment effect, Journal of Applied Econometrics, 24, 163-185.

Iacus, S.M., Porro, G. (2007) Missing data imputation, matching and other applications of random recursive partitioning, Computational Statistics and Data Analysis, 52, 2, 773-789.

See Also

rrp.dist, rrp.class

Examples

data(iris)

X <- iris
n <- dim(X)[1]

set.seed(123)
miss <- sample(1:n, 10)
for(i in miss)
 X[i, sample(1:5, 2)] <- NA
 
X[miss,] 

## unsupervised
x <- rrp.impute(X)

x$new.data[miss,]
iris[miss,]

[Package rrp version 2.9 Index]