pamCat {scrime} | R Documentation |
Performs a Prediction Analysis of Categorical Data.
pamCat(data, cl, theta = NULL, n.theta = 10, newdata = NULL, newcl = NULL)
data |
a numeric matrix composed of the integers between 1 and n.cat,
where n.cat is the number of levels each of the variables represented
by the rows of data must take. No missing values allowed. |
cl |
a numeric vector of length ncol(data) comprising the class labels of
the observations represented by the columns of data . cl must consist
of the integers between 1 and n.cl, where n.cl is the number
of classes. |
theta |
a numeric vector consisting of the strictly positive values of the shrinkage parameter used
in the Prediction Analysis. If NULL , a vector consisting of n.theta values for
the shrinkage parameter are determined automatically. |
n.theta |
an integer specifying the number of values for the shrinkage parameter of the
Prediction Analysis. Ignored if theta is specified. |
newdata |
a numeric matrix composed of the integers between 1 and n.cat.
Must have the same number of rows as data , and each row of newdata must contain
the same variable as the corresponding row of data . newdata is employed to
compute the misclassification rates of the Prediction Analysis for the given values of the
shrinkage parameter. If NULL , data is used to determine the misclassification rates. |
newcl |
a numeric vector of length ncol(newdata) that consists of integers between
1 and n.cl, and specifies the class labels of the observations in newdata .
Must be specified, if newdata is specified. |
An object of class pamCat
composed of
mat.chisq |
a matrix with m rows and n.cl columns consisting of the classwise values of Pearson's ChiSquare statistic for each of the m variables. |
mat.obs |
a matrix with m rows and n.cat * n.cl columns
in which each row shows a contingency table between the corresponding variable and cl . |
mat.exp |
a matrix of the same size as mat.obs containing the numbers of observations
expected under the null hypothesis of an association between the respective variable and cl . |
mat.theta |
a data frame consisting of the numbers of variables used in the classification
of the observations in newdata and the corresponding misclassification rates for a set of values of
the shrinkage parameter theta. |
tab.cl |
a table summarizing the values of the response, i.e. the class labels. |
n.cat |
n.cat. |
Holger Schwender, holger.schwender@udo.edu
Schwender, H. (2007). Statistical Analysis of Genotype and Gene Expression Data. Dissertation, Department of Statistics, University of Dortmund.
## Not run: # Generate a data set consisting of 2000 rows (variables) and 50 columns. # Assume that the first 25 observations belong to class 1, and the other # 50 observations to class 2. mat <- matrix(sample(3, 100000, TRUE), 2000) rownames(mat) <- paste("SNP", 1:2000, sep = "") cl <- rep(1:2, e = 25) # Apply PAM for categorical data to this matrix, and compute the # misclassification rate on the training set, i.e. on mat. pam.out <- pamCat(mat, cl) pam.out # Now generate a new data set consisting of 20 observations, # and predict the classes of these observations using the # value of theta that has led to the smallest misclassification # rate in pam.out. mat2 <- matrix(sample(3, 40000, TRUE), 2000) rownames(mat2) <- paste("SNP", 1:2000, sep = "") predict(pam.out, mat2) # Let's assume that the predicted classes are the real classes # of the observations. Then, mat2 can also be used in pamCat # to compute the misclassification rate. cl2 <- predict(pam.out, mat2) pamCat(mat, cl, newdata = mat2, newcl = cl2) ## End(Not run)