pelora {supclust} | R Documentation |
Performs selection and supervised grouping of predictor variables in large (microarray gene expression) datasets, with an option for simultaneous classification. Works in a greedy forward strategy and optimizes the binomial log-likelihood, based on estimated conditional probabilities from penalized logistic regression analysis.
pelora(x, y, u = NULL, noc = 10, lambda = 1/32, flip = "pm", standardize = TRUE, trace = 1)
x |
Numeric matrix of explanatory variables (p variables in columns, n cases in rows). For example, these can be microarray gene expression data which should be grouped. |
y |
Numeric vector of length n containing the class labels of the individuals. These labels have to be coded by 0 and 1. |
u |
Numeric matrix of additional (clinical) explanatory variables (m variables in columns, n cases in rows) that are used in the (penalized logistic regression) prediction model, but neither grouped nor averaged. For example, these can be 'traditional' clinical variables. |
noc |
Integer, the number of clusters that should be searched for on the data. |
lambda |
Real, defaults to 1/32. Rescaled penalty parameter that should be in [0,1]. |
flip |
Character string, describing a method how the x
(gene expression) matrix should be sign-flipped. Possible are
"pm" (the default) where the sign for each variable is
determined upon its entering into the group, "cor" where the
sign for each variable is determined a priori as the sign of the
empirical correlation of that variable with the y -vector, and
"none" where no sign-flipping is carried out. |
standardize |
Logical, defaults to TRUE . Is indicating
whether the predictor variables (genes) should be standardized to
zero mean and unit variance. |
trace |
Integer >= 0; when positive, the output of the internal
loops is provided; trace >= 2 provides output even from the
internal C routines. |
pelora
returns an object of class "pelora". The functions
print
and summary
are used to obtain an overview of the
variables (genes) that have been selected and the groups that have
been formed. The function plot
yields a two-dimensional
projection into the space of the first two group centroids that
pelora
found. The generic function fitted
returns
the fitted values, these are the cluster representatives. coef
returns the penalized logistic regression coefficients theta_j
for each of the predictors. Finally, predict
is used for
classifying test data with Pelora's internal penalized logistic
regression classifier on the basis of the (gene) groups that have been
found.
An object of class "pelora" is a list containing:
genes |
A list of length noc , containing integer vectors
consisting of the indices (column numbers) of the variables (genes)
that have been clustered. |
values |
A numerical matrix with dimension n times
noc , containing the fitted values, i.e. the group
centroids tilde{x}_j. |
y |
Numeric vector of length n containing the class labels of the individuals. These labels are coded by 0 and 1. |
steps |
Numerical vector of length noc , showing the number
of forward/backward cycles in the fitting process of each cluster. |
lambda |
The rescaled penalty parameter. |
noc |
The number of clusters that has been searched for on the data. |
px |
The number of columns (genes) in the x -matrix. |
flip |
The method that has been chosen for sign-flipping the
x -matrix. |
var.type |
A factor with noc entries, describing whether
the jth predictor is a group of predictors (genes) or a single
(clinical) predictor variable. |
crit |
A list of length noc , containing numerical vectors
that provide information about the development of the grouping
criterion during the clustering. |
signs |
Numerical vector of length p, saying whether the ith variable (gene) should be sign-flipped (-1) or not (+1). |
samp.names |
The names of the samples (rows) in the
x -matrix. |
gene.names |
The names of the variables (columns) in the
x -matrix. |
call |
The function call. |
Marcel Dettling, dettling@stat.math.ethz.ch
Marcel Dettling (2003) Finding Predictive Gene Groups from Microarray Data, see http://stat.ethz.ch/~dettling/supervised.html
Marcel Dettling and Peter Bühlmann (2002). Supervised Clustering of Genes. Genome Biology, 3(12): research0069.1-0069.15.
Marcel Dettling and Peter Bühlmann (2004). Finding Predictive Gene Groups from Microarray Data. To appear in the Journal of Multivariate Analysis
wilma
for another supervised clustering technique.
## Working with a "real" microarray dataset data(leukemia, package="supclust") ## Generating random test data: 3 observations and 250 variables (genes) set.seed(724) xN <- matrix(rnorm(750), nrow = 3, ncol = 250) ## Fitting Pelora fit <- pelora(leukemia.x, leukemia.y, noc = 3) ## Working with the output fit summary(fit) plot(fit) fitted(fit) coef(fit) ## Fitted values and class probabilities for the training data predict(fit, type = "cla") predict(fit, type = "prob") ## Predicting fitted values and class labels for the random test data predict(fit, newdata = xN) predict(fit, newdata = xN, type = "cla", noc = c(1,2,3)) predict(fit, newdata = xN, type = "pro", noc = c(1,3)) ## Fitting Pelora such that the first 70 variables (genes) are not grouped fit <- pelora(leukemia.x[, -(1:70)], leukemia.y, leukemia.x[,1:70]) ## Working with the output fit summary(fit) plot(fit) fitted(fit) coef(fit) ## Fitted values and class probabilities for the training data predict(fit, type = "cla") predict(fit, type = "prob") ## Predicting fitted values and class labels for the random test data predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70]) predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70], "cla", noc = 1:10) predict(fit, newdata = xN[, -(1:70)], newclin = xN[, 1:70], type = "pro")