nnmiss {SeqKnn} | R Documentation |
A function to select k nearest neighbors using Euclidean distance, and estimate missing value with weighted mean of selected neighbors.
nnmiss(x, xmiss, ismiss, K)
x |
data frame which contains only complete cases |
xmiss |
data frame which contains incomplete cases |
ismiss |
data frame with logical value(TRUE or FALSE) of xmiss |
K |
number of nearest neighbors |
Appropriate number of k is 10-20. However, we need to control k smaller in case missing rate is high, especially k is larger than the size of complete set.
Ki-Yeol Kim and Gwan-Su Yi
data(khan05) x <- as.matrix(khan05) N <- dim(x) p <- N[2] N <- N[1] nas <- is.na(drop(x %*% rep(1, p))) xcomplete <- x[!nas, ] ## complete set xbad <- x[nas, , drop = FALSE] ## incomplete set xnas <- is.na(xbad) xbadhat <- xbad xbadhat[1,]<-nnmiss(xcomplete, xbad[1,], xnas[1,], 10) ## The function is currently defined as function (x, xmiss, ismiss, K = k) { xd <- scale(x, xmiss, FALSE)[, !ismiss] dd <- drop(xd^2 %*% rep(1, ncol(xd))) od <- order(dd)[seq(K)] od<-od[!is.na(od)] ## control k value when k is smaller than the size of complete data K<-length(od) distance<-dd[od] s<-sum(1/distance) weight<-(1/distance)/s xmiss[ismiss] <- drop(weight %*% x[od, ismiss, drop = FALSE]) ## weighted mean ## xmiss[ismiss] <- drop(rep(1/K, K) %*% x[od, ismiss, drop = FALSE]) ## mean xmiss }