nnmiss {SeqKnn}R Documentation

Selects k nearest neighbors and calculates weighted mean of them

Description

A function to select k nearest neighbors using Euclidean distance, and estimate missing value with weighted mean of selected neighbors.

Usage

nnmiss(x, xmiss, ismiss, K)

Arguments

x data frame which contains only complete cases
xmiss data frame which contains incomplete cases
ismiss data frame with logical value(TRUE or FALSE) of xmiss
K number of nearest neighbors

Details

Appropriate number of k is 10-20. However, we need to control k smaller in case missing rate is high, especially k is larger than the size of complete set.

Author(s)

Ki-Yeol Kim and Gwan-Su Yi

Examples

    data(khan05)
    x <- as.matrix(khan05)
    N <- dim(x)
    p <- N[2]
    N <- N[1]
    nas <- is.na(drop(x %*% rep(1, p)))
    xcomplete <- x[!nas, ]           ## complete set
    xbad <- x[nas, , drop = FALSE]       ## incomplete set
    xnas <- is.na(xbad)       
    xbadhat <- xbad
    xbadhat[1,]<-nnmiss(xcomplete, xbad[1,], xnas[1,], 10)
## The function is currently defined as
function (x, xmiss, ismiss, K = k) 
{
    xd <- scale(x, xmiss, FALSE)[, !ismiss]
    dd <- drop(xd^2 %*% rep(1, ncol(xd)))
    od <- order(dd)[seq(K)]
    od<-od[!is.na(od)]   ## control k value when k is smaller than the size of complete data 
    K<-length(od)
    distance<-dd[od]
    s<-sum(1/distance)
    weight<-(1/distance)/s
    xmiss[ismiss] <- drop(weight %*% x[od, ismiss, drop = FALSE]) ## weighted mean
##  xmiss[ismiss] <- drop(rep(1/K, K) %*% x[od, ismiss, drop = FALSE])  ## mean
    xmiss
  }

[Package SeqKnn version 1.0.0 Index]