dlda {supclust} | R Documentation |
The four functions nnr
(nearest neighbor rule),
dlda
(diagonal linear discriminant analysis), logreg
(logistic regression) and aggtrees
(aggregated trees) are used
for binary classification with the cluster representatives of Wilma's
output.
dlda(xlearn, xtest, ylearn) nnr(xlearn, xtest, ylearn) logreg(xlearn, xtest, ylearn) aggtrees(xlearn, xtest, ylearn)
xlearn |
Numeric matrix of explanatory variables (q variables in columns, n cases in rows), containing the learning or training data. Typically, these are the (gene) cluster representatives of Wilma's output. |
xtest |
A numeric matrix of explanatory variables (q
variables in columns, m cases in rows), containing the test or
validation data. Typically, these are the fitted (gene) cluster
representatives of Wilma's output for the training data, obtained
from predict.wilma . |
ylearn |
Numeric vector of length n containing the class labels for the training observations. These labels have to be coded by 0 and 1. |
nnr
implements the 1-nearest-neighbor-rule with
Euclidean distance function. dlda
is linear discriminant
analysis, using the restriction that the covariance matrix is diagonal
with equal variance for all predictors. logreg
is default
logistic regression. aggtrees
fits a default stump (a
classification tree with two terminal nodes) by rpart
for every
predictor variable and uses majority voting to determine the final
classifier.
Numeric vector of length m, containing the predicted class labels for the test observations. The class labels are coded by 0 and 1.
Marcel Dettling, dettling@stat.math.ethz.ch
Marcel Dettling (2002) Supervised Clustering of Genes, see http://stat.ethz.ch/~dettling/supercluster.html
Marcel Dettling and Peter Bühlmann (2002). Supervised Clustering of Genes. Genome Biology, 3(12): research0069.1-0069.15.
## Generating random learning data: 20 observations and 10 variables (clusters) set.seed(342) xlearn <- matrix(rnorm(200), nrow = 20, ncol = 10) ## Generating random test data: 8 observations and 10 variables(clusters) xtest <- matrix(rnorm(80), nrow = 8, ncol = 10) ## Generating random class labels for the learning data ylearn <- as.numeric(runif(20)>0.5) ## Predicting the class labels for the test data nnr(xlearn, xtest, ylearn) dlda(xlearn, xtest, ylearn) logreg(xlearn, xtest, ylearn) aggtrees(xlearn, xtest, ylearn)