candisc {candisc}R Documentation

Canonical discriminant analysis

Description

candisc performs a generalized canonical discriminant analysis for one term in a multivariate linear model (i.e., an mlm object), computing canonical scores and vectors. It represents a transformation of the original variables into a canonical space of maximal differences for the term, controlling for other model terms. To be of any use, the term should be a factor or interaction corresponding to a multivariate test with 2 or more degrees of freedom for the null hypothesis.

Usage

candisc(mod, ...)

## S3 method for class 'mlm':
candisc(mod, term, type = "2", manova, ndim = rank, ...)

## S3 method for class 'candisc':
coef(object, type = c("std", "raw", "structure"), ...)

## S3 method for class 'candisc':
plot(x, which = 1:2, conf = 0.95, col, pch, scale, asp = 1,
    var.col = "blue", var.lwd = par("lwd"), prefix = "Can", suffix=TRUE, ...)
    
## S3 method for class 'candisc':
print(x, digits=max(getOption("digits") - 2, 3), ...)

## S3 method for class 'candisc':
summary(object, means = TRUE, scores = FALSE, coef = c("std"),
    ndim, digits = max(getOption("digits") - 2, 4), ...)

Arguments

mod An mlm object, such as computed by lm() with a multivariate response
term the name of one term from mod
type type of test for the model term, one of: "II", "III", "2", or "3"
manova the Anova.mlm object corresponding to mod. Normally, this is computed internally by Anova(mod)
ndim Number of dimensions to store in (or retrieve from, for the summary method) the means, structure, scores and coeffs.* components. The default is the rank of the H matrix for the hypothesis term.
object, x A candisc object
which A vector of two integers, selecting the canonical dimensions to plot
conf Confidence coefficient for the confidence circles plotted in the plot method
col A vector of colors to be used for the levels of the term in the plot method
pch A vector of point symbols to be used for the levels of the term in the plot method
scale Scale factor for the variable vectors in canonical space. If not specified, a scale factor is calculated to make the variable vectors approximately fill the plot space.
asp Aspect ratio for the plot method. The asp=1 (the default) assures that the units on the horizontal and vertical axes are the same, so that lengths and angles of the variable vectors are interpretable.
var.col Color used to plot variable vectors
var.lwd Line width used to plot variable vectors
prefix Prefix used to label the canonical dimensions plotted
suffix Suffix for labels of canonical dimensions. If suffix=TRUE the percent of hypothesis (H) variance accounted for by each canonical dimension is added to the axis label.
means Logical value used to determine if canonical means are printed
scores Logical value used to determine if canonical scores are printed
coef Type of coefficients printed by the summary method. Any one or more of "std", "raw", or "structure"
digits significant digits to print.
... arguments to be passed down. In particular, type="n" can be used with the plot method to suppress the display of canonical scores.

Details

Canonical discriminant analysis is typically carried out in conjunction with a one-way MANOVA design. It represents a linear transformation of the response variables into a canonical space in which (a) each successive canonical variate produces maximal separation among the groups (e.g., maximum univariate F statistics), and (b) all canonical variates are mutually uncorrelated. For a one-way MANOVA with g groups and p responses, there are dfh = min( g-1, p) such canonical dimensions, and tests, initally stated by Bartlett (1938) allow one to determine the number of significant canonical dimensions. Computational details for the one-way case are described in Cooley & Lohnes (1971), and in the SAS/STAT User's Guide, "The CANDISC procedure: Computational Details," http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/candisc_sect12.htm.

A generalized canonical discriminant analysis extends this idea to a general multivariate linear model. Analysis of each term in the mlm produces a rank dfh H matrix sum of squares and crossproducts matrix that is tested against the rank dfe E matrix by the standard multivariate tests (Wilks' Lambda, Hotelling-Lawley trace, Pillai trace, Roy's maximum root test). For any given term in the mlm, the generalized canonical discriminant analysis amounts to a standard discriminant analysis based on the H matrix for that term in relation to the full-model E matrix.

Value

An object of class candisc with the following components:

dfh hypothesis degrees of freedom for term
dfe error degrees of freedom for the mlm
rank number of non-zero eigenvalues of HE^{-1}
eigenvalues eigenvalues of HE^{-1}
canrsq squared canonical correlations
pct A vector containing the percentages of the canrsq of their total.
ndim Number of canonical dimensions stored in the means, structure and coeffs.* components
means A data.frame containing the class means for the levels of the factor(s) in the term
factors A data frame containing the levels of the factor(s) in the term
term name of the term
terms A character vector containing the names of the terms in the mlm object
coeffs.raw A matrix containing the raw canonical coefficients
coeffs.std A matrix containing the standardized canonical coefficients
structure A matrix containing the canonical structure coefficients on ndim dimensions, i.e., the correlations between the original variates and the canonical scores. These are sometimes referred to as Total Structure Coefficients.
scores A data frame containing the predictors in the mlm model and the canonical scores on ndim dimensions. These are calculated as Y %*% coeffs.raw, where Y contains the standardized response variables.

Author(s)

Michael Friendly and John Fox

References

Bartlett, M. S. (1938). Further aspects of the theory of multiple regression. Proc. Camb. Phil. Soc. 34, 33-34.

Cooley, W.W. & Lohnes, P.R. (1971). Multivariate Data Analysis, New York: Wiley.

Gittins, R. (1985). Canonical Analysis: A Review with Applications in Ecology, Berlin: Springer.

See Also

candiscList, heplot, heplot3d

Examples

grass.mod <- lm(cbind(N1,N9,N27,N81,N243) ~ Block + Species, data=Grass)
Anova(grass.mod,test="Wilks")

grass.can1 <-candisc(grass.mod, term="Species")
plot(grass.can1, type="n")

# library(heplots)
heplot(grass.can1, scale=6)

# iris data
iris.mod <- lm(cbind(Petal.Length, Sepal.Length, Petal.Width, Sepal.Width) ~ Species, data=iris)
iris.can <- candisc(iris.mod, data=iris)
plot(iris.can)

heplot(iris.can)

# 1-dim plot
iris.can1 <- candisc(iris.mod, data=iris, ndim=1)
plot(iris.can1)


[Package candisc version 0.5-13 Index]