sga {pga} | R Documentation |
The SGA algorithm as described in Technometrics 48, page 493, Table 5.
Before running parallel evolution, it is sometimes useful to get a rough
idea of how many generations to evolve in each universe. To do so, one can
use sga
with check.convg=TRUE
; see example below. Otherwise,
this function is often NOT used directly.
sga(y, X, m, N, start = NULL, mutation=1/ncol(X), prior = 0.35, check.convg = FALSE, thresh = 0.05) sga(y, X, N=50) sga(y, X, N=200, check.convg=TRUE)
y |
an n -by-1 response vector. |
X |
an n -by-p matrix; each column is a
candidate predictor variable. |
m |
population size in each universe, default = ncol(X) or
ncol(X)+1 , depending on whether ncol(X) is even or odd. |
N |
number of generations to evolve in each universe; this needs to be fairly short to prevent each evolutionary path from converging. |
start |
the starting population; mostly useless, default =
NULL . |
mutation |
mutation rate; this can be a vector of length N if
a different mutation rate is needed for each generation
t=1,2,...,N ; default = 1/p for all t=1,2,...,N . |
prior |
prior probability which controls the density of 1's in the
initial population, default = 0.35 , but if there is some prior
information that the number of relevant variables is large, then it can be
more efficient to use a higher prior , e.g., prior=0.7 . |
check.convg |
TRUE if running sga initially
to find out the number of generations needed for a single path to
converge; FALSE otherwise. |
thresh |
a prespecified threshold; if the entropy of the population
falls below thresh , the evolutionary algorithm is deemed to have
converged; see Technometrics 48, page 495, Section 3.2. |
ans |
returned only if check.convg=TRUE ; it is the number of
iterations needed to achieve convergence. |
popn |
last-generation population after N generations of
evolution, returned as an m -by-p binary matrix. |
combo.gene |
a p -by-1 vector, whose j -th
element is the frequency that variable j “shows up” in the
last-generation population. |
best |
a p -by-1 binary vector, representing the best
solution after N generations of evolution. If the evolutionary
algorithm has converged in the entropy sense, then combo.gene is
expected to be the same as best after rounding; see example below. |
optval |
used during code development; ignore. |
convg |
same as above. |
perf |
same as above. |
Dandi Qiao and Mu Zhu, University of Waterloo, Canada.
Zhu M, Chipman HA (2006). Darwinian evolution in parallel universes: A parallel genetic algorithm for variable selection. Technometrics, 48(4), 491–502.
## simulate some data sigma <- 1 N <- 50 d <- 10 truth <- c(2,5,8) beta <- rep(0,d) beta[truth] <- c(1,1,1) X <- matrix(rnorm(N*d), N, d) y <- X %*% beta + sigma*rnorm(N) ## get a rough idea of how many generations are needed for ## the evolutionary algorithm to converge (in the entropy sense) check=numeric(5) for (i in 1:5){ check[i] = sga(y, X, N=200, check.convg=TRUE)$ans } round(mean(check)) ## if round(mean(check)) above is equal to 20, then one often runs ## just 10 generations in each parallel universe to prevent each path ## from converging ... ## run a long evolutionary path and identify the best solution, ## but this is often not too useful ... however, from the example ## below, you will see that, if evolution has converged in the ## entropy sense, then $best and $combo.gene are not ## going to be very different ... stuff<-sga(y, X, N=200) stuff$best round(stuff$combo.gene)