sga {pga}R Documentation

Single-path Genetic Algorithm for Variable Selection

Description

The SGA algorithm as described in Technometrics 48, page 493, Table 5.

Before running parallel evolution, it is sometimes useful to get a rough idea of how many generations to evolve in each universe. To do so, one can use sga with check.convg=TRUE; see example below. Otherwise, this function is often NOT used directly.

Usage

sga(y, X, m, N, start = NULL, mutation=1/ncol(X), prior = 0.35, check.convg = FALSE, thresh = 0.05)
sga(y, X, N=50)
sga(y, X, N=200, check.convg=TRUE)

Arguments

y an n-by-1 response vector.
X an n-by-p matrix; each column is a candidate predictor variable.
m population size in each universe, default = ncol(X) or ncol(X)+1, depending on whether ncol(X) is even or odd.
N number of generations to evolve in each universe; this needs to be fairly short to prevent each evolutionary path from converging.
start the starting population; mostly useless, default = NULL.
mutation mutation rate; this can be a vector of length N if a different mutation rate is needed for each generation t=1,2,...,N; default = 1/p for all t=1,2,...,N.
prior prior probability which controls the density of 1's in the initial population, default = 0.35, but if there is some prior information that the number of relevant variables is large, then it can be more efficient to use a higher prior, e.g., prior=0.7.
check.convg TRUE if running sga initially to find out the number of generations needed for a single path to converge; FALSE otherwise.
thresh a prespecified threshold; if the entropy of the population falls below thresh, the evolutionary algorithm is deemed to have converged; see Technometrics 48, page 495, Section 3.2.

Value

ans returned only if check.convg=TRUE; it is the number of iterations needed to achieve convergence.
popn last-generation population after N generations of evolution, returned as an m-by-p binary matrix.
combo.gene a p-by-1 vector, whose j-th element is the frequency that variable j “shows up” in the last-generation population.
best a p-by-1 binary vector, representing the best solution after N generations of evolution. If the evolutionary algorithm has converged in the entropy sense, then combo.gene is expected to be the same as best after rounding; see example below.
optval used during code development; ignore.
convg same as above.
perf same as above.

Author(s)

Dandi Qiao and Mu Zhu, University of Waterloo, Canada.

References

Zhu M, Chipman HA (2006). Darwinian evolution in parallel universes: A parallel genetic algorithm for variable selection. Technometrics, 48(4), 491–502.

See Also

pga

Examples

## simulate some data
sigma <- 1
N <- 50
d <- 10
truth <- c(2,5,8)
beta <- rep(0,d)
beta[truth] <- c(1,1,1) 
X <- matrix(rnorm(N*d), N, d)
y <- X %*% beta + sigma*rnorm(N)

## get a rough idea of how many generations are needed for
## the evolutionary algorithm to converge (in the entropy sense) 
check=numeric(5)
for (i in 1:5){
 check[i] = sga(y, X, N=200, check.convg=TRUE)$ans
}
round(mean(check))
## if round(mean(check)) above is equal to 20, then one often runs
## just 10 generations in each parallel universe to prevent each path
## from converging ...

## run a long evolutionary path and identify the best solution, 
## but this is often not too useful ... however, from the example
## below, you will see that, if evolution has converged in the
## entropy sense, then $best and $combo.gene are not
## going to be very different ...
stuff<-sga(y, X, N=200)
stuff$best
round(stuff$combo.gene)

[Package pga version 0.1-1 Index]