genoud {rgenoud} | R Documentation |
GENOUD is a function that combines evolutionary algorithm methods with a derivative-based, quasi-Newton method to solve difficult unconstrained optimization problems. GENOUD is made to solve problems that are nonlinear or perhaps even discontinuous in the parameters of the function to be optimized. When a statistical model's estimating function (for example, a log-likelihood) is nonlinear in the model's parameters, the function to be optimized will usually not be globally concave and may contain irregularities such as saddlepoints or discontinuous jumps. Optimization methods that rely on derivatives of the objective function may be unable to find any optimum at all. Multiple local optima may exist, so that there is no guarantee that a derivative-based method will converge to the global optimum. On the other hand, algorithms that do not use derivative information (such as pure GAs) are for many problems needlessly poor at local hill climbing. Most statistical problems are regular in the neighborhood of the solution. Therefore, for some portion of the search space derivative information is useful.
genoud(fn, nvars, max=FALSE, pop.size=1000, max.generations=100, wait.generations=10, hard.generation.limit=TRUE, starting.values=NULL, MemoryMatrix=NULL, Debug=FALSE, Domains=NULL, default.domains=10, gradient.check=TRUE, boundary.enforcement=2, solution.tolerance=0.001, BFGS=TRUE, data.type.int=FALSE, hessian=FALSE, unif.seed=812821, int.seed=53058, print.level=2, share.type=0, instance.number=0, output.path="stdout", output.append=FALSE, project.path=NULL, P1=50, P2=50, P3=50, P4=50, P5=50, P6=50, P7=50, P8=50, P9=0)
fn |
The function to be minimized (or maximized if
max=TRUE). The first argument of the function must be the
vector of parameters over which minimizing is to
occur. The function must return a scalar result.
For example, if we wish to maximize the sin()
function. We can simply call genoud by genoud(sin,
nvars=1,max=TRUE) . |
nvars |
This is the number of variables the function to be minimized (or maximized) takes. |
max |
Maximization (TRUE) or Minimizing (FALSE). This variable tells GENOUD if it is to minimize or maximize the objective function. |
pop.size |
Population Size. This is the number of individuals GENOUD uses to
solve the optimization problem. There are several restrictions on
what the value of this number can be. No matter what population
size the user requests, the number is automatically adjusted to make
certain that the relevant restrictions are satisfied. These
restrictions originate
in what is required by several of the operators. In particular,
operators 6 (Multiple Point Simple Crossover) and 8 (Heuristic
Crossover) require an even number of individuals to work on—i.e., they
require two parents. Therefore, the pop.size variable and the
operators sets must be such that these three operators have an even
number of individuals to work with. If this does not occur, the
population size is automatically increased until this constraint is
satisfied. |
max.generations |
Maximum Generations. This is the maximum number of generations that
GENOUD will run when attempting to optimize a function. This is a
soft limit. The maximum generation limit will be binding for
GENOUD only if hard.generation.limit has
been set equal to TRUE. If it has not been set equal to
TRUE, two soft triggers control when GENOUD stops:
wait.generations and gradient.check . Although, the max.generations variable is not, by default,
binding, it is nevertheless important because many operators use it
to adjust
their behavior. In essence, many of the operators become less random
as the generation count gets closer to the max.generations
limit. If
the limit is hit and GENOUD decides to
continue working, GENOUD automatically increases the
max.generation
limit.Please see MemoryMatrix for some important interactions
with memory management. |
wait.generations |
If there is no improvement in the objective function in this number
of generations, GENOUD will think that it has
found the optimum. If the
gradient.check trigger has been
turned on, GENOUD will only start counting wait.generations
if the gradients are within
solution.tolerance of zero. The
other variables controlling termination are
max.generations and hard.generation.limit . |
hard.generation.limit |
This logical variable determines if the max.generations
variable is a binding constraint for GENOUD. If
hard.generation.limit is FALSE, then GENOUD may exceed
the max.generations count if either the objective function
has improved within a given number of generations (determined by
wait.generations ) or if the gradients are not zero
(determined by gradient.check ). Please see MemoryMatrix for some important interactions
with memory management. |
starting.values |
This vector contains the starting values which GENOUD will use
at startup. The starting.values vector is a way for the user
to insert one individual into the starting population.
GENOUD will randomly create the other individuals. |
MemoryMatrix |
This variable controls if GENOUD sets up a memory matrix. Such a
matrix ensures that GENOUD will request the fitness evaluation
of a given set of parameters only once. The variable may be
TRUE or FALSE. If it is FALSE, GENOUD
will be aggressive in
conserving memory. The most significant negative implication of
this variable being set to FALSE is that GENOUD will no
longer maintain a memory
matrix of all evaluated individuals. Therefore, GENOUD may request
evaluations which it has already previously requested. When
nvars is large, the memory matrix consumes a large amount of RAM.By default GENOUD sets MemoryMatrix equal to TRUE unless the
number of parameters is greater than 20. In that case, GENOUD sets the default
value equal to FALSE. GENOUD's memory matrix will require significantly less memory if the user sets hard.generation.limit equal
to TRUE. Doing this is a good way of conserving
memory while still making use of the memory matrix structure. |
Debug |
This variable turns on some debugging information. This variable may be TRUE or FALSE. |
Domains |
This is a nvars *2
matrix. The first column is the lower bound, and the second column is
the upper bound. None of GENOUD's starting population will be
generated outside of the bounds. But some of the operators may
generate children which
will be outside of the bounds unless the
boundary.enforcement flag is
turned on. If the user does not provide any values for Domains, GENOUD will setup default domains using default.domains . For linear and nonlinear constraints please see the discussion in the Note section. |
default.domains |
If the user does not want to provide a Domains matrix,
domains may nevertheless be set by the user with this easy to use
scalar option. GENOUD will create a
Domains matrix by setting the lower bound for all of the parameters
equal to -1 * default.domains and the upper
bound equal to default.domains . |
gradient.check |
If this variable is TRUE, GENOUD will not start counting
wait.generations unless each gradient is
solution.tolerance close to zero. This
variable has no effect if the max.generations limit has been
hit and the hard.generation.limit option has been set to
TRUE. |
boundary.enforcement |
This variable determines the degree to which GENOUD obeys the
boundary constraints. Notwithstanding
the value of the variable, none of GENOUD's starting population will
be outside of the bounds. But some of the operators may generate
children which will be outside of the bounds unless the
boundary.enforcement flag is turned on. boundary.enforcement has three possible values: 0 (anything goes), 1
(regular), and 2 (no trespassing):
|
solution.tolerance |
This is the tolerance level used by GENOUD. Numbers within
solution.tolerance are considered to be equal. This is
particularly
important when it comes to evaluating wait.generations and
conducting the gradient.check . |
BFGS |
This variable denotes whether or not GENOUD applies a quasi-Newton derivative optimizer (BFGS) to the best individual at the end of each generation after the initial one. Setting BFGS to FALSE does not mean that the BFGS will never be used. In particular, Operator 9 (Local-Minimum Crossover) must also be set to zero. |
data.type.int |
This option sets the data type of the parameters of the function to
be optimized. If the variable is TRUE,
GENOUD is
informed that it is dealing with integer values. Use of the
integer data type is supported only as a beta feature. Some of the
included operators will not work well with integer type
parameters. With integer parameters, GENOUD never uses derivative information. This implies that the BFGS quasi-Newton optimizer is never used—i.e., the BFGS flag is set to FALSE. It
also implies
that Operator 9 (Local-Minimum Crossover) is set to zero and that
gradient checking (as a convergence criterion) is turned off. No
matter what other options have been set to,
data.type.int takes precedence—i.e., if GENOUD is told that
it is searching over an integer parameter space, gradient
information is never considered. There is no option to mix integer and floating point parameters. If one wants to mix the two, it is suggested that the user pick integer type and in her objective function map a particular integer range into a floating point number range. For example, tell GENOUD to search from 0 to 100 and divide by 100 to obtain a search grid of 0 to 1.0 (by .1). |
hessian |
When this flag is set to TRUE, GENOUD will return the hessian matrix at the solution as part of its return list. A user can use this matrix to calculate standard errors. |
unif.seed |
This is the first of the two random number seeds which GENOUD uses. The default value of this seed is 81282. |
int.seed |
This is the second of the two random number seeds which GENOUD uses. The default value of this seed is 53058. |
print.level |
This variable controls the level of printing that GENOUD does. There
are four possible levels: 0 (minimal printing), 1 (normal), 2
(detailed), and 3 (debug). If level 2 is selected, GENOUD will
print details about the population at each generation. The
print.level variable also significantly affects how much
detail is placed in the project file—see project.path .
Note that R convention would have
us at print level 0 (minimal printing). However, because GENOUD
runs may take a long time, it is important for the user to receive
feedback. Hence, print level 2 has been set as the default. |
share.type |
If share.type is equal to 1, then GENOUD, at startup, checks
to see if there is an existing project file (see
project.path ). If such a file exists, it initializes its
original population using it. If the project file contains a smaller population than the current GENOUD run, GENOUD will randomly create the necessary individuals. If the project file contains a larger population than the current GENOUD run, GENOUD will kill the necessary individuals using exponential selection. If the number of variables (see nvars )
reported in the project file is different from the current GENOUD run,
GENOUD does not use the project file (regardless of the value of
share.type ) and GENOUD generates the necessary starting
population at random. |
instance.number |
This number (starting from 0) denotes the number of recursive
instances of GENOUD. GENOUD then sets up its random number
generators and other such structures so that the multiple instances
do not interfere with each other. It is
up to the user to make certain that the different instances of
GENOUD are not writing to
the same output file(s): see output.path and
project.path . For the R version of GENOUD this variable is of limited use. It is basically there in case a GENOUD run is being used to optimize the result of another GENOUD run (i.e., a recursive implementation). |
output.path |
This is the full (relative) path to where GENOUD's output is to go.
If the value of output.path = ``stdout'', then GENOUD's output
will go to standard output in UNIX and to the GUI console in Windows.
Also see output.append and project.path . |
output.append |
If output is being sent to a file (see output.path ), this logical variable
tells GENOUD whether it should append to the file if it already
exists or if it should overwrite an existing file. |
project.path |
This is the path of the GENOUD project file. By default GENOUD
places it's output in a file called "genoud.pro" located in the
temporary directory provided by tempdir . The behavior
of the project file depends on the print.level chosen. If
the print.level variable is set to 1, then the project file
is rewritten after each generation. Therefore, only the currently
fully completed generation is included in the file. If the
print.level variable is set to 2, then each new generation is
simply appended to the project file. For all other values of
print.level , the project file is not created. |
P1 |
This is the cloning operator. GENOUD always clones the best individual each generation. But this operator clones others as well. Please see the Operators Section for details about operators and how they are weighted. |
P2 |
This is the uniform mutation operator. One parameter of the parent is mutated. Please see the Operators Section for details about operators and how they are weighted. |
P3 |
This is the boundary mutation operator. This operator finds a parent and mutates one of its parameters towards the boundary. Please see the Operators Section for details about operators and how they are weighted. |
P4 |
Non-Uniform Mutation. Please see the Operators Section for details about operators and how they are weighted. |
P5 |
This is the polytope crossover. Please see the Operators Section for details about operators and how they are weighted. |
P6 |
Multiple Point Simple Crossover. Please see the Operators Section for details about operators and how they are weighted. |
P7 |
Whole Non-Uniform Mutation. Please see the Operators Section for details about operators and how they are weighted. |
P8 |
Heuristic Crossover. Please see the Operators Section for details about operators and how they are weighted. |
P9 |
Local-Minimum Crossover: BFGS. This is rather CPU intensive, and should be generally used less than the other operators. Please see the Operators Section for details about operators and how they are weighted. |
GENOUD returns a list with 7 objects. 8 objects are returned if the
user has requested the hessian to be calculated at the
solution. Please see the hessian
option. The returned
objects are:
value |
This variable contains the fitness value at the solution. |
generations |
This variables contains the number of generations GENOUD ran for. |
peakgeneration |
This variable contains the generation number at which GENOUD found the solution. |
pop.size |
This variable contains the population size that GENOUD actually used.
See pop.size for why this value may differ from the
population size the user requested. |
par |
This vector contains the parameter values found at the solution. |
gradients |
This vector contains the gradients found at the solution. If no gradients were calculated, they are reported to be -1.0. |
operators |
This vector reports the actual number of operators (of each type) GENOUD used. Please see the Operators Section for details. |
hessian |
If the user has requested the hessian
matrix to be returned (via the hessian flag), the hessian
at the solution will be returned. The user may use this matrix to calculate standard
errors. |
GENOUD has nine operators that it uses. The integer values which are
assigned to each of these operators (P1...P9) are
weights.
GENOUD calculates the sum of s =
P1+P2+...+P9. Each operator is
assigned a weight equal to W_n =
s/(P_n). The number of
times an operator is called usually equals c_n = W_n * pop.size.
Operators 6 (Multiple Point Simple Crossover) and 8 (Heuristic
Crossover) require an even number of individuals to work on—i.e.,
they require two parents. Therefore, the pop.size
variable and
the operators sets must be such that these three operators have an
even number of individuals to work with. If this does not occur,
GENOUD automatically upwardly adjusts the population size to make this
constraint hold.
Strong uniqueness checks have been built into the operators to help
ensure that the operators produce offspring different from their
parents, but this does not always happen.
Note that GENOUD always keeps the best individual each generation.
GENOUD's 9 operators are:
For more information please see Table 1 of the reference article: http://jsekhon.fas.harvard.edu/genoud/node7.shtml.
The most important options affecting performance are those determining
population size (pop.size
) and the
number of generations the algorithm runs
(max.generations
, wait.generations
,
hard.generation.limit
and gradient.check
). Search
performance is expected to improve as
the population size and the number of generations the program runs
increase. These and the other options should be adjusted for the
problem at hand. Please pay particular attentions to the search
domains (Domains
and default.domains
). For more information
please see the reference article.
Linear and nonlinear constraints among the parameters can be
introduced by users in their fit function. For example, if
the sum of parameters 1 and 2 must be less than 725, the following can
be placed in the fit function the user is going to have GENOUD
maximize: if ( (parm1 + parm2) >= 725) { return(-99999999) }
.
In this example, a very bad fit value is returned to GENOUD if the
linear constrain is violated. GENOUD will then attempt to find
parameter values that satisfy the constraint.
Walter R. Mebane, Jr., Cornell University,
wrm1@cornell.edu, http://macht.arts.cornell.edu/wrm1/
Jasjeet S. Sekhon, Harvard University, jasjeet_sekhon@harvard.edu, http://jsekhon.fas.harvard.edu/
Sekhon, Jasjeet Singh and Walter R. Mebane, Jr. 1998. ``Genetic
Optimization Using Derivatives: Theory and Application to Nonlinear
Models.'' Political Analysis, 7: 187-210.
http://jsekhon.fas.harvard.edu/genoud/genoud.pdf
Mebane, Walter R., Jr. and Jasjeet S. Sekhon. 2004. ``Robust Estimation and Outlier Detection for Overdispersed Multinomial Models of Count Data.'' American Journal of Political Science, 48 (April): 391-410. http://elections.fas.harvard.edu/election2000/MebaneSekhon.multinom.pdf
#maximize the sin function sin1 <- genoud(sin, nvars=1, max=TRUE); #minimize the sin function sin2 <- genoud(sin, nvars=1, max=FALSE); #maximize a univariate normal mixture which looks like a claw and #plot it claw <- function(xx) { Nd <- function(x, mu, sigma) { w <- (1.0/sqrt(2.0*pi*sigma*sigma)) ; z <- (x-mu)/sigma; w <- w*exp(-0.5*z*z) ; as.double(w); } x <- xx[1]; y <- (0.46*(Nd(x,-1.0,2.0/3.0) + Nd(x,1.0,2.0/3.0)) + (1.0/300.0)*(Nd(x,-0.5,.01) + Nd(x,-1.0,.01) + Nd(x,-1.5,.01)) + (7.0/300.0)*(Nd(x,0.5,.07) + Nd(x,1.0,.07) + Nd(x,1.5,.07))) ; as.double(y); } claw1 <- genoud(claw, nvars=1,P9=100,max=TRUE); xx <- seq(-3,3,.05); plot(xx,lapply(xx,claw),type="l",xlab="Parameter",ylab="Fit",main="RGENOUD: Maximize the Claw Density"); points(claw1$par,claw1$value,col="red"); #maximize a bivariate normal mixture which looks like a claw biclaw <- function(xx) { mNd2 <- function(x1, x2, mu1, mu2, sigma1, sigma2, rho) { z1 <- (x1-mu1)/sigma1; z2 <- (x2-mu2)/sigma2; w <- (1.0/(2.0*pi*sigma1*sigma2*sqrt(1-rho*rho))) ; w <- w*exp(-0.5*(z1*z1 - 2*rho*z1*z2 + z2*z2)/(1-rho*rho)) ; as.double(w); } x1 <- xx[1]+1; x2 <- xx[2]+1; y <- (0.5*mNd2(x1,x2,0.0,0.0,1.0,1.0,0.0) + 0.1*(mNd2(x1,x2,-1.0,-1.0,0.1,0.1,0.0) + mNd2(x1,x2,-0.5,-0.5,0.1,0.1,0.0) + mNd2(x1,x2,0.0,0.0,0.1,0.1,0.0) + mNd2(x1,x2,0.5,0.5,0.1,0.1,0.0) + mNd2(x1,x2,1.0,1.0,0.1,0.1,0.0))); as.double(y); } biclaw1 <- genoud(biclaw, nvars=2,P9=100,max=TRUE);