lmHmat {subselect} | R Documentation |
Computes total an effect matrices of Sums of Squares and Cross-Product (SSCP) deviations, divided by a normalizing constant, in linear regression or canonical correlation analysis. These matrices may be used as input to the variable selection search routines anneal
, genetic
improve
or leaps
.
## Default S3 method: lmHmat(x,y,...) ## S3 method for class 'data.frame': lmHmat(x,y,...) ## S3 method for class 'formula': lmHmat(formula,data=NULL,...)
x |
A matrix or data frame containing the variables for which the SSCP matrix is to be computed. |
y |
A matrix or data frame containing the set of fixed variables, the association of x is to be measured with. |
formula |
A formula of the form 'y ~ x1 + x2 +
...' . That is, the response is the set of fixed variables and the
right hand side specifies the variables whose subsets are to be
compared. |
data |
Data frame from which variables specified in 'formula' are preferentially to be taken. |
... |
further arguments for the method. |
Let x and y be two different groups of linearly independent
variables observed on the same set of data units. It is well known
that the association between x and y can be measured by their squared
canonical correlations which may be found as the positive eigenvalues
of certain matrix products. In particular, if T_x and
H_{x/y} denote SSCP
matrices of deviations from the mean, respectively for the original x
variables (T_x) and for their orthogonal projections onto the space
spanned by the y's (H_{x/y}), then the positive eigenvalues of
T_x^{-1}H_{x/y} equal the squared correlations between x and
y. Alternatively
these correlations could also be found from T_y^{-1} H_{y/x} but here,
assuming a goal of comparing x's subsets for a given fixed set of y's,
we will focus on the former product. lmHmat
computes a scaled version
of T_x and H_{x/y} such that T_x is converted into a
covariance matrix. These matrices can be used as input to the search routines
anneal
, genetic
improve
and
leaps
that try to select x subsets based on several
functions of their squared correlations with y. We note that when
there is only one variable in the y set, this is equivalent to
selecting predictors for linear regression based on the traditional
coefficient of determination.
A list with four items:
mat |
The total SSCP matrix divided by nrow(x)-1 |
H |
The effect SSCP matrix divided by nrow(x)-1 |
r |
The expected rank of the H matrix which, under the
assumption of linear independence, equals the minimum between the
number of variables in the x and y sets. The true rank of H can be
different from r if the linear independence condition fails. |
call |
The function call which generated the output. |
anneal
, genetic
,
improve
, leaps
, lm
.
##------------------------------------------------------------------ ## 1) An example of subset selection in the context of Multiple ## Linear Regression. Variable 5 (average price) in the Cars93 MASS ## library is to be regressed on 13 other variables. The goal is to ## compare subsets of these 13 variables according to their ability ## to predict car prices. library(MASS) data(Cars93) lmHmat(Cars93[c(7:8,12:15,17:22,25)],Cars93[5]) ## 2) An example of subset selection in the context of Canonical ## Correlation Analysis. Two groups of variables within the Cars93 ## MASS library data set are compared. The first group (variables 4th, ## 5th and 6th) relates to price, while the second group is formed by 13 ## variables that describe several technical car specifications. The ## goal is to select subsets of the second group that are optimal in ## terms of preserving the canonical correlations with the variables in ## the first group (Warning: the 3-variable "response" group is kept ## intact; subset selection is to be performed only in the 13-variable ## group). library(MASS) data(Cars93) lmHmat(Cars93[c(7:8,12:15,17:22,25)],Cars93[4:6])