candisc {candisc} | R Documentation |
candisc
performs a generalized canonical discriminant analysis for
one term in a multivariate linear model (i.e., an mlm
object),
computing canonical scores and vectors. It represents a transformation
of the original variables into a canonical space of maximal differences
for the term, controlling for other model terms.
To be of any use,
the term should be a factor or interaction corresponding to a
multivariate test with 2 or more degrees of freedom for the
null hypothesis.
candisc(mod, ...) ## S3 method for class 'mlm': candisc(mod, term, type = "2", manova, ndim = rank, ...) ## S3 method for class 'candisc': coef(object, type = c("std", "raw", "structure"), ...) ## S3 method for class 'candisc': plot(x, which = 1:2, conf = 0.95, col, pch, scale, asp = 1, var.col = "blue", var.lwd = par("lwd"), prefix = "Can", suffix=TRUE, titles.1d = c("Canonical scores", "Structure"), ...) ## S3 method for class 'candisc': print(x, digits=max(getOption("digits") - 2, 3), ...) ## S3 method for class 'candisc': summary(object, means = TRUE, scores = FALSE, coef = c("std"), ndim, digits = max(getOption("digits") - 2, 4), ...)
mod |
An mlm object, such as computed by lm() with a multivariate response |
term |
the name of one term from mod |
type |
type of test for the model term , one of: "II", "III", "2", or "3" |
manova |
the Anova.mlm object corresponding to mod . Normally,
this is computed internally by Anova(mod) |
ndim |
Number of dimensions to store in (or retrieve from, for the summary method)
the means , structure , scores and
coeffs.* components. The default is the rank of the H matrix for the hypothesis
term. |
object, x |
A candisc object |
which |
A vector of two integers, selecting the canonical dimensions to plot |
conf |
Confidence coefficient for the confidence circles plotted in the plot method |
col |
A vector of colors to be used for the levels of the term in the plot method.
In this version, you should assign colors and point symbols explicitly, rather than relying on
the somewhat arbitrary defaults. |
pch |
A vector of point symbols to be used for the levels of the term in the plot method |
scale |
Scale factor for the variable vectors in canonical space. If not specified, a scale factor is calculated to make the variable vectors approximately fill the plot space. |
asp |
Aspect ratio for the plot method. The asp=1 (the default) assures that
the units on the horizontal and vertical axes are the same, so that lengths and angles of the
variable vectors are interpretable. |
var.col |
Color used to plot variable vectors |
var.lwd |
Line width used to plot variable vectors |
prefix |
Prefix used to label the canonical dimensions plotted |
suffix |
Suffix for labels of canonical dimensions. If suffix=TRUE
the percent of hypothesis (H) variance accounted for by each canonical dimension is added to the axis label. |
titles.1d |
A character vector of length 2, containing titles for the panels used to plot the canonical scores and structure vectors, for the case in which there is only one canonical dimension. |
means |
Logical value used to determine if canonical means are printed |
scores |
Logical value used to determine if canonical scores are printed |
coef |
Type of coefficients printed by the summary method. Any one or more of "std", "raw", or "structure" |
digits |
significant digits to print. |
... |
arguments to be passed down. In particular, type="n" can be used with
the plot method to suppress the display of canonical scores. |
Canonical discriminant analysis is typically carried out in conjunction with
a one-way MANOVA design. It represents a linear transformation of the response variables
into a canonical space in which (a) each successive canonical variate produces
maximal separation among the groups (e.g., maximum univariate F statistics), and
(b) all canonical variates are mutually uncorrelated.
For a one-way MANOVA with g groups and p responses, there are
dfh
= min( g-1, p) such canonical dimensions, and tests, initally stated
by Bartlett (1938) allow one to determine the number of significant
canonical dimensions. Computational details for the one-way case are described
in Cooley & Lohnes (1971), and in the SAS/STAT User's Guide, "The CANDISC procedure:
Computational Details," http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/candisc_sect12.htm.
A generalized canonical discriminant analysis extends this idea to a general
multivariate linear model. Analysis of each term in the mlm
produces
a rank dfh H matrix sum of squares and crossproducts matrix that is
tested against the rank dfe E matrix by the standard multivariate
tests (Wilks' Lambda, Hotelling-Lawley trace, Pillai trace, Roy's maximum root
test). For any given term in the mlm
, the generalized canonical discriminant
analysis amounts to a standard discriminant analysis based on the H matrix for that
term in relation to the full-model E matrix.
An object of class candisc
with the following components:
dfh |
hypothesis degrees of freedom for term |
dfe |
error degrees of freedom for the mlm |
rank |
number of non-zero eigenvalues of HE^{-1} |
eigenvalues |
eigenvalues of HE^{-1} |
canrsq |
squared canonical correlations |
pct |
A vector containing the percentages of the canrsq of their total. |
ndim |
Number of canonical dimensions stored in the means , structure and coeffs.* components |
means |
A data.frame containing the class means for the levels of the factor(s) in the term |
factors |
A data frame containing the levels of the factor(s) in the term |
term |
name of the term |
terms |
A character vector containing the names of the terms in the mlm object |
coeffs.raw |
A matrix containing the raw canonical coefficients |
coeffs.std |
A matrix containing the standardized canonical coefficients |
structure |
A matrix containing the canonical structure coefficients on ndim dimensions, i.e.,
the correlations between the original variates and the canonical scores.
These are sometimes referred to as Total Structure Coefficients. |
scores |
A data frame containing the predictors in the mlm model and the
canonical scores on ndim dimensions.
These are calculated as Y %*% coeffs.raw , where Y contains the
standardized response variables. |
Michael Friendly and John Fox
Bartlett, M. S. (1938). Further aspects of the theory of multiple regression. Proc. Camb. Phil. Soc. 34, 33-34.
Cooley, W.W. & Lohnes, P.R. (1971). Multivariate Data Analysis, New York: Wiley.
Gittins, R. (1985). Canonical Analysis: A Review with Applications in Ecology, Berlin: Springer.
grass.mod <- lm(cbind(N1,N9,N27,N81,N243) ~ Block + Species, data=Grass) Anova(grass.mod,test="Wilks") grass.can1 <-candisc(grass.mod, term="Species") plot(grass.can1, type="n") # library(heplots) heplot(grass.can1, scale=6) # iris data iris.mod <- lm(cbind(Petal.Length, Sepal.Length, Petal.Width, Sepal.Width) ~ Species, data=iris) iris.can <- candisc(iris.mod, data=iris) #-- assign colors and symbols corresponding to species col <- rep(c("red", "black", "blue"), each=50) pch <- rep(1:3, each=50) plot(iris.can, col=col, pch=pch) heplot(iris.can) # 1-dim plot iris.can1 <- candisc(iris.mod, data=iris, ndim=1) plot(iris.can1)