gknn {scrime} | R Documentation |
Predicts the classes of new observations with k Nearest Neighbors based on an user-specified distance measure.
gknn(data, cl, newdata, nn = 5, distance = NULL, use.weights = FALSE, ...)
data |
a numeric matrix in which each row represents an observation and each column
a variable. If distance is "smc" , "cohen" or "pcc" ,
the values in data must be integers between 1 and n.cat,
where n.cat is the maximum number of levels one of the variables can
take. Missing values are allowed. |
cl |
a numeric vector of length nrow(data) giving the class labels of
the observations represented by the rows of data . cl must consist
of integers between 1 and n.cl, where n.cl is the
number of groups. |
newdata |
a numeric matrix in which each row represents a new observation for
which the class label should be predicted and each column consists of the same
variable as the corresponding column of data . |
nn |
an integer specifying the number of nearest neighbors used to classify the new observations. |
distance |
character vector naming the distance measure used to identify the
nn nearest neighbors. Must be one of "smc" , "cohen" ,
"pcc" , "euclidean" , "maximum" , "manhattan" ,
"canberra" , and "minkowski" . If NULL , it is determined in
an ad hoc way if the data seems to be categorical. If this is the case distance
is set to "smc" . Otherwise, it is set to "euclidean" . |
use.weights |
should the votes of the nearest neighbors be weighted by the reciprocal of the distances to the new observation when the class of a new observation should be predicted? |
... |
further arguments for the distance measure. If, e.g.,
distance = "minkowski" , then p can also be specified, see dist .
If distance = "pcc" , then version can also be specified,
see pcc . |
The predicted classes of the new observations.
Holger Schwender, holger.schwender@udo.edu
Schwender, H. (2007). Statistical Analysis of Genotype and Gene Expression Data. Dissertation, Department of Statistics, University of Dortmund.
## Not run: # Using the example from the function knn. library(class) data(iris3) train <- rbind(iris3[1:25,,1], iris3[1:25,,2], iris3[1:25,,3]) test <- rbind(iris3[26:50,,1], iris3[26:50,,2], iris3[26:50,,3]) cl <- c(rep(2, 25), rep(1, 25), rep(1, 25)) knn.out <- knn(train, test, as.factor(cl), k = 3, use.all = FALSE) gknn.out <- gknn(train, cl, test, nn = 3) # Both applications lead to the same predictions. knn.out == gknn.out # But gknn allows to use other distance measures than the Euclidean # distance. E.g., the Manhattan distance. gknn(train, cl, test, nn = 3, distance = "manhattan") ## End(Not run)