Factanal {FAiR}R Documentation

Estimate Common Factor Analysis Models

Description

This function estimates models for semi-exploratory factor analysis (SEFA), exploratory factor analysis (EFA), and confirmatory factor analysis (CFA) using a genetic algorithm.

Usage

Factanal(x, factors, data = NULL, covmat = NULL, n.obs = NA, subset, 
na.action, scores = "none", seeds = 12345, lower = sqrt(.Machine$double.eps), 
model = c("SEFA", "EFA", "CFA"), method = c("MLE", "YWLS"), 
restrictions, fixed, criteria = NULL, robust.covmat = FALSE, ...)

Arguments

The first several arguments (through scores) are largely similar to those in factanal.

x A formula or a numeric matrix or an object that can be coerced to a numeric matrix. This argument is required if covmat = NULL or if robust.covmat = TRUE and is always recommended if the raw data are available.
factors The number (>0) of factors to be fitted, which differs from the argument in factanal in that factors can be a numeric vector of length two to indicate the number of factors to extract at level one and level two of a two-level semi-exploratory or confirmatory factor analysis model. If a single number and greater than 2 and model != "EFA", Factanal will ask whether to estimate a second level in the usual case where restrictions is unspecified.
data An optional data frame (or similar: see model.frame), used only if x is a formula. By default the variables are taken from environment(formula).
covmat A covariance matrix, or a covariance list as returned by cov.wt or similar. If the covariance matrix is really a correlation matrix, it is not (yet) possible to (accurately) calculate some measures of uncertainty, so the covariance matrix (or list) should be passed instead of the correlation matrix if available. It is always recommended to use x to pass the raw data to Factanal, but covmat is required if x is unspecified.
n.obs The number of observations, used if covmat is a covariance matrix. It is possible to obtain point estimates without knowing the number of observations, but it is not possible to calculate measures of uncertainty.
subset A specification of the cases to be used, if x is a matrix or formula.
na.action The na.action to be used if x is used as a formula.
scores Type of scores to produce, if any. The default is "none". Other valid choices (which can be abbreviated) are "regression", "Bartlett", "Thurstone", "Ledermann", "Anderson-Rubin", "McDonald", "Krinjen", "Takeuchi", and "Harman". See Beauducel (2007) for formulae for these factor scores as well as proofs that all but "regression" and "Harman" produce the same correlation matrix.
seeds A vector of length one or two to be used as the random number generator seeds corresponding to the unif.seed and int.seed arguments to genoud respectively. If seeds is a single number, this seed is used for both unif.seed and int.seed. These seeds override the defaults for genoud and make it easier to replicate an analysis exactly.
lower A lower bound. In exploratory factor analysis using the fitting function in factanal, this argument corresponds to the 'lower' element of the list specified for control in factanal and indicates the smallest permissable value for a uniqueness. Otherwise, this argument is the lower bound used for eigenvalues and singular values when checking for positive-definiteness and ranks of matrices. If you happen to get errors referencing positive definiteness, try increasing the value of lower.
model A character string indicating "SEFA", "EFA", or "CFA" to indicate whether a semi-exploratory, an exploratory, or a confirmatory factor analysis model should be estimated. Defaults to "SEFA".
method A character string indicating "MLE" or "YWLS" to indicate how the model should be estimated. Defaults to "MLE". The "YWLS" option uses Yates' (1987) weighted-least squares criterion as opposed to most of the weighted-least squares criteria that are usually mentioned in the factor analysis literature. See the warning below.
restrictions An optional object of class "restrictions". It is almost always best to leave this argument unspecified to allow Factanal to prompt for restrictions with its pop-up menus. This argument is primarily intended for use in simulations. See restrictions-class for more information about how it should be specified if you do not want to be bothered by the pop-up menus.
fixed An optional matrix or list of two matrices that specifies the values of certain coefficients, which would be utilized most often in confirmatory factor analysis and is inappropriate for exploratory factor analysis. However, it is never necessary to explicitly supply this argument because Factanal will prompt you to construct such a matrix or matrices in confirmatory and (mixed) semi-exploratory factor analysis models with a correctly formatted skeleton.

If fixed is a matrix, it should have rows equal to the number of outcome variables and columns equal to the number of factors at level one. If fixed is a list of two matrices, the first element of the list corresponds to the coefficient matrix at level one and the second element corresponds to the coefficient matrix at level two (and should have rows equal to the number of first-order factors and columns equal to the number of second-order factors). Unrestricted coefficients should be denoted with NA in the appropriate row and column of the appropriate matrix. Restricted coefficients should be denoted with the value of the restricted coefficient in the appropriate row and column of the appropriate matrix.
criteria An optional list whose elements should be functions or character strings that name functions to be used as criteria during the lexical optimization when model != "EFA". It is almost always best to leave this argument unspecified to allow Factanal to prompt for these criteria with pop-up menus. This argument is primarily intended for use in simulations.

If criteria is a list that includes character strings the strings should be one or more of "no_suppressors_xxx", "dets_xxx", or "cohyperplanarity" where "xxx" is either "1st" or "2nd" to indicate whether the criterion should be applied to the first or second level of the model. Thus, a suffix of "2nd" is only appropriate if a two-level model is estimated. The function implied by method is automatically appended to the end of this list to serve as the ultimate lexical criterion.
robust.covmat A logical indicating whether a minimum covariance determinant estimator of the sample covariance matrix should be used. If TRUE, this option requires that either robustbase or MASS be installed; see covMcd and cov.mcd respectively. Also, it requires that the raw data be passed via the x argument. Although it defaults to FALSE, TRUE is probably recommended in most cases.
... Further arguments that are passed to genoud. Note that several of the default arguments to genoud are silently overridden by Factanal out of logical necissitity. These overridden defaults are: nvars, max, hessian, lexical, Domains, data.type.int, fn, BFGSfn, gr, BFGShelp, unif.seed, int.seed, and — in some cases — boundary.enforcement. In addition, several of the default arguments to genoud are silently overridden unless the user explicitly specifies them by passing the arguments through dots. Such arguments include: MemoryMatrix = FALSE, print.level = 1, P9mix = 1, MemoryMatrix = FALSE, print.level = 1, max.generations = 1000, project.path, starting.values, and — in most cases — boundary.enforcement = 1. The arguments to genoud that remain at their defaults but you may want to seriously consider tweaking are pop.size and wait.generations.

Details

SEFA, EFA, and CFA models all estimate the same population model but impose different restrictions on the model. If restrictions is unspecified, Factanal will create an object that inherits from class "restrictions" based on the responses the user gives to the pop-up menus. The vignette provides a step-by-step guide to navigating the pop-up menus; execute vignette("FAiR") to read it. Factanal will then impose different restrictions on the model, depending on the inherited class of this object.

The CFA model is perhaps the most straightforward in the sense that the user specifies that certain coefficients are pegged to particular values, and Factanal estimates the values of the free parameters. These restrictions can be specified via the fixed argument or left unspecified in which case Factanal will prompt you to specify the restrictions via a pop-up menu. Factanal relies on the theorem in Howe (1955) to overcome rotational indeterminancy. Namely, the factors are scaled to have unit variance and at least factors - 1 coefficients per factor are pegged to zero such that a technical rank condition is satisfied. This mechanism for eliminating rotational indeterminancy is somewhat more limited than the options that are available in some other software packages for factor analysis but easily generalizes to SEFA models.

A SEFA model differs from a CFA model in that the analyst specifies how many coefficients per factor take the value of zero, and the algorithm estimates the locations of these zeros along with the values of the corresponding free coefficients. It is also possible to estimate a mixed SEFA model where some coefficients are fixed to zero (or another number) a priori and the locations of the remaining zeros are estimated. A SEFA model requires that the Howe (1955) theorem be satisfied and at least one additional restriction is imposed. SEFA models are new to the literature and more information about them can be found in Goodrich (2008).

A EFA model specifies an arbitrary set of restrictions that are minimally necessary to extract factors and then a transformation of the factors should be obtained using Rotate. By default, the fitting function used by Factanal to estimate a EFA model is the same as that used in factanal. However, there is an alternative choice that estimates a EFA model via a CFA algorithm with the upper triangle of the coefficient matrix filled with zeros. The results, when method = "MLE" should be the same — up to a transformation of the factors — but in practice can differ if there are optimization failures. The default algorithm is considered more reliable at this point, but the alternative algorithm must be used if method = "YWLS". Which of these two algorithms is used depends on the class of restrictions, which in the usual case that it is unspecified will result in a pop-up menu asking which algorithm to use.

It is not necessary to provide starting values for the parameters, since there are methods for that purpose. See S4GenericsFAiR. But a matrix of starting values can be passed to through the dots to genoud. This matrix should have rows equal to the pop.size argument in genoud and columns equal to the number of free free parameters in the model, which corresponds to the nvars argument in genoud. The order of the parameters / columns proceeds from the “top” of the model to the “bottom” as follows. First come the cells that comprise the upper triangle of the factor intercorrelation matrix at level two (if there is more than one second-order factor). Next come the free cells of the coefficient matrix at level two in row-major order. Then come the free cells that comprise the upper triange of the factor intercorrelation matrix at level one (if a second-order model is not estimated). Next come the free cells of the coefficient matrix at level one in row-major order. Finally come the diagonal cells of the uniqueness matrix. Note that a parameter is free unless it is fixed a priori, which is to say that in SEFA models coefficients are considered free even if there is a possibility that the algorithm will bind them to zero at the optimum, rendering them “not free” for the purpose of counting degrees of freedom.

Value

An object of formal S4 class "FA", or in the case of two-level models an object of formal S4 class "FA.general" or "FA.2ndorder".

Warning

Yates' (1987 p.229) weighted least squares criterion has never received much scrutiny. The criterion (but not Yates' algorithm) is included in FAiR so that it can be fully evaluated. However, it is not scale invariant, it does not lend itself to calculating standard errors or test statistics, and in limited testing seems prone to finding a solution that is geared more toward minimizing the weights than minimizing the squared residuals.

Note

The underlying genetic algorithm will print a variety of output as it progresses. On Windows, you have to move the scrollbar periodically to flush the output to the screen. The output will look something like this
0 1.0 1.0 ... 1.0 double
1 1.0 1.0 ... 1.0 double
... ... ... ... ... ...
437 1.0 1.0 ... 1.0 double

The integer on the very left indicates the generation number. If it appears to skip one or more generations, that signifies that the best individual in the “missing” generation was no better than the best individual in the previous generation. The sequence of ones indicates that various constraints are being satisfied by the best individual in the generation. Some of these constraints are hard-coded, some are added by the choices the user makes. The curious are referred to the source code, but for the most part users need not worry about them provided they are 1.0. If any but the last are not 1.0 after the first few generations, there is a problem because no individual is satisfying all the constraints. The last number is a double-precision number, typically the log-likelihood. This number will increase, sometimes painfully slowly, sometimes intermittently, over the generations since the log-likelihood is being maximized subject to the aforementioned constraints.

Author(s)

Ben Goodrich http://wiki.r-project.org/rwiki/doku.php?id=packages:cran:fair

References

Barthlomew, D. J. and Knott, M. (1990) Latent Variable Analysis and Factor Analysis. Second Edition, Arnold.

Beauducel, A. (2007) In spite of indeterminancy, many common factor score estimates yield an identical reproduced covariance matrix. Psychometrika, 72, 437–441.

Goodrich, B. (2008) SEFAiR So Far. Unpublished manuscript linked at http://wiki.r-project.org/rwiki/doku.php?id=packages:cran:fair#to_paper_s_about_the_ideas_in_fair.

Smith, G. A. and Stanley G. (1983) Clocking g: relating intelligence and measures of timed performance. Intelligence, 7, 353–368.

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Yates, A. (1987) Multivariate Exploratory Data Analysis: A Perspective on Exploratory Factor Analysis. State University of New York Press.

See Also

Rotate and factanal

Examples

## Example from Venables and Ripley (2002, p. 323)
## Previously from Bartholomew and Knott  (1999, p. 68--72)
## Originally from Smith and Stanley (1983)

data(ability.cov)
print(ability.cov)

if(TRUE){ # NOTE: One would usually not bother with this block. It just makes the
          # example go quickly and without user intervention on the pop-up menus.
starts1 <- c(0.4551693481819578, 
             0.5893203083906567, 
             0.2182044732474321, 
             0.7694294930481663,
             0.0526383747875095, 
             0.3334323600411430)
starts1 <- matrix(starts1, nrow = 1)

example1 <- new("restrictions.factanal", factors = 2L, nvars = 6L,
                Domains = cbind(sqrt(.Machine$double.eps), rep(1, 6)),
                model = "EFA", method = "MLE", dof = 4L, fast = FALSE)
}

# 'restrictions' and 'starting.values' would typically be left unspecified!
efa <- Factanal(covmat = ability.cov, factors = 2, model = "EFA",
                restrictions = example1, starting.values = starts1)
show(efa)
summary(efa)

# 'criteria' would typically be left unspecified!
efa.rotated <- Rotate(efa, criteria = list("phi"))
summary(efa.rotated)

if(TRUE){ # NOTE: One would usually not bother with this block. It just makes the
          # example go quickly and without user intervention on the pop-up menus.
starts2 <- c(4.46294498156615e-01,
             4.67036349420035e-01,
             6.42220238211291e-01,
             8.88564379236454e-01,
             4.77779639176941e-01,
            -7.13405536379741e-02,
            -9.47782525342137e-08,
             4.04993872375487e-01,
            -1.04604290549591e-08,
            -9.44950629176182e-03,
             2.63078925240678e-04,
             9.38038168787216e-01,
             8.43618801925473e-01,
             4.49024212016027e-01,
             5.87550265675745e-01,
             2.17850254355888e-01,
             7.71724777627142e-01,
             1.20084009542348e-01,
             2.88308011310065e-01)

starts2 <- matrix(starts2, nrow = 1)

Domains <- cbind(-1, 1)
Domains <- rbind(Domains, cbind(-1.5, rep(1.5, 12)))
Domains <- rbind(Domains, cbind(0, rep(1, 6)))
fixed   <- matrix(NA_real_, nrow = 6, ncol = 2)
fix_beta_args <- as.list(formals(FAiR:::FAiR_fix_coefficients))
fix_beta_args$zeros <- c(2,2)
beta_select <- c(FALSE, rep(TRUE, length(fixed)), rep(FALSE, nrow(fixed)))
beta_list <- list(beta = fixed, free = c(is.na(fixed)),
                  num_free = length(fixed), select = beta_select,
                  fix_beta_args = fix_beta_args)
Theta2_list <- list(Theta2 = diag(nrow(fixed)), 
                    select = c(rep(FALSE, length(fixed) + 1),
                               rep(TRUE, nrow(fixed))))
Phi <- diag(c(0.5, 0.5))
example2 <- new("restrictions.1storder", factors = c(2L, 0L),
                Domains = Domains, nvars = nrow(Domains), 
                model = "SEFA", method = "MLE", dof = 6L,
                Phi = Phi, beta = beta_list, Theta2 = Theta2_list,
                criteria = list(llik = FAiR:::FAiR_criterion_llik))
}

# 'restrictions' and 'starting.values' would typically be left unspecified!
sefa <- Factanal(covmat = ability.cov, factors = 2, model = "SEFA",
                 restrictions = example2, starting.values = starts2)
show(sefa)
summary(sefa)

stuff <- list() # output list for various methods, also works on efa and efa.rotated
stuff$model.matrix <- model.matrix(sefa) # sample correlation matrix
stuff$fitted <- fitted(sefa) # reproduced correlation with communalities on diagonal
stuff$residuals <- residuals(sefa) # difference between model.matrix and fitted
stuff$rstandard <- rstandard(sefa) # residual matrix rescaled to a correlation matrix
stuff$weights <- weights(sefa) # (scaled) approximate weights for residuals
stuff$influence <- influence(sefa) # weights * residuals
stuff$logLik <- logLik(sefa) # log-likelihood
stuff$BIC <- BIC(sefa) # BIC
stuff$profile <- profile(sefa) # profile plots of non-free parameters
plot(sefa)  # advanced Scree plot
pairs(sefa) # Thurstone-style plot

[Package FAiR version 0.2-0 Index]