genoud {rgenoud}R Documentation

GENetic Optimization Using Derivatives

Description

GENOUD is a function that combines evolutionary algorithm methods with a derivative-based, quasi-Newton method to solve difficult unconstrained optimization problems. GENOUD is made to solve problems that are nonlinear or perhaps even discontinuous in the parameters of the function to be optimized. When a statistical model's estimating function (for example, a log-likelihood) is nonlinear in the model's parameters, the function to be optimized will usually not be globally concave and may contain irregularities such as saddlepoints or discontinuous jumps. Optimization methods that rely on derivatives of the objective function may be unable to find any optimum at all. Multiple local optima may exist, so that there is no guarantee that a derivative-based method will converge to the global optimum. On the other hand, algorithms that do not use derivative information (such as pure GAs) are for many problems needlessly poor at local hill climbing. Most statistical problems are regular in the neighborhood of the solution. Therefore, for some portion of the search space derivative information is useful.

Usage

genoud(fn, nvars, max=FALSE, pop.size=1000, 
   max.generations=100, wait.generations=10, hard.generation.limit=TRUE,
   starting.values=NULL, MemoryMatrix=NULL, Debug=FALSE, Domains=NULL, 
   default.domains=10, gradient.check=TRUE, boundary.enforcement=2,
   solution.tolerance=0.001, BFGS=TRUE, data.type.int=FALSE, hessian=FALSE,
   unif.seed=812821, int.seed=53058,
   print.level=2, share.type=0, instance.number=0,
   output.path="stdout", output.append=FALSE, project.path="genoud.pro",
   P1=50, P2=50, P3=50, P4=50, P5=50, P6=50, P7=50, P8=50, P9=0)

Arguments

fn The function to be minimized (or maximized if max=TRUE). The first argument of the function must be the vector of parameters over which minimizing is to occur. The function must return a scalar result.
For example, if we wish to maximize the sin() function. We can simply call genoud by genoud(sin, nvars=1,max=TRUE).
nvars This is the number of variables the function to be minimized (or maximized) takes.
max Maximization (TRUE) or Minimizing (FALSE). This variable tells GENOUD if it is to minimize or maximize the objective function.
pop.size Population Size. This is the number of individuals GENOUD uses to solve the optimization problem. There are several restrictions on what the value of this number can be. No matter what population size the user requests, the number is automatically adjusted to make certain that the relevant restrictions are satisfied. These restrictions originate in what is required by several of the operators. In particular, operators 6 (Multiple Point Simple Crossover) and 8 (Heuristic Crossover) require an even number of individuals to work on—i.e., they require two parents. Therefore, the pop.size variable and the operators sets must be such that these three operators have an even number of individuals to work with. If this does not occur, the population size is automatically increased until this constraint is satisfied.
max.generations Maximum Generations. This is the maximum number of generations that GENOUD will run when attempting to optimize a function. This is a soft limit. The maximum generation limit will be binding for GENOUD only if hard.generation.limit has been set equal to TRUE. If it has not been set equal to TRUE, two soft triggers control when GENOUD stops: wait.generations and gradient.check.

Although, the max.generations variable is not, by default, binding, it is nevertheless important because many operators use it to adjust their behavior. In essence, many of the operators become less random as the generation count gets closer to the max.generations limit. If the limit is hit and GENOUD decides to continue working, GENOUD automatically increases the max.generation limit.

Please see MemoryMatrix for some important interactions with memory management.
wait.generations If there is no improvement in the objective function in this number of generations, GENOUD will think that it has found the optimum. If the gradient.check trigger has been turned on, GENOUD will only start counting wait.generations if the gradients are within solution.tolerance of zero. The other variables controlling termination are max.generations and hard.generation.limit.
hard.generation.limit This logical variable determines if the max.generations variable is a binding constraint for GENOUD. If hard.generation.limit is FALSE, then GENOUD may exceed the max.generations count if either the objective function has improved within a given number of generations (determined by wait.generations) or if the gradients are not zero (determined by gradient.check).

Please see MemoryMatrix for some important interactions with memory management.
starting.values This vector contains the starting values which GENOUD will use at startup. The starting.values vector is a way for the user to insert one individual into the starting population. GENOUD will randomly create the other individuals.
MemoryMatrix This variable controls if GENOUD sets up a memory matrix. Such a matrix ensures that GENOUD will request the fitness evaluation of a given set of parameters only once. The variable may be TRUE or FALSE. If it is FALSE, GENOUD will be aggressive in conserving memory. The most significant negative implication of this variable being set to FALSE is that GENOUD will no longer maintain a memory matrix of all evaluated individuals. Therefore, GENOUD may request evaluations which it has already previously requested. When nvars is large, the memory matrix consumes a large amount of RAM.

By default GENOUD sets MemoryMatrix equal to TRUE unless the number of parameters is greater than 20. In that case, GENOUD sets the default value equal to FALSE.

GENOUD's memory matrix will require significantly less memory if the user sets hard.generation.limit equal to TRUE. Doing this is a good way of conserving memory while still making use of the memory matrix structure.
Debug This variable turns on some debugging information. This variable may be TRUE or FALSE.
Domains This is a nvars *2 matrix. The first column is the lower bound, and the second column is the upper bound. None of GENOUD's starting population will be generated outside of the bounds. But some of the operators may generate children which will be outside of the bounds unless the boundary.enforcement flag is turned on.

If the user does not provide any values for Domains, GENOUD will setup default domains using default.domains.

For linear and nonlinear constraints please see the discussion in the Note section.
default.domains If the user does not want to provide a Domains matrix, domains may nevertheless be set by the user with this easy to use scalar option. GENOUD will create a Domains matrix by setting the lower bound for all of the parameters equal to -1 * default.domains and the upper bound equal to default.domains.
gradient.check If this variable is TRUE, GENOUD will not start counting wait.generations unless each gradient is solution.tolerance close to zero. This variable has no effect if the max.generations limit has been hit and the hard.generation.limit option has been set to TRUE.
boundary.enforcement This variable determines the degree to which GENOUD obeys the boundary constraints. Notwithstanding the value of the variable, none of GENOUD's starting population will be outside of the bounds. But some of the operators may generate children which will be outside of the bounds unless the boundary.enforcement flag is turned on.

boundary.enforcement has three possible values: 0 (anything goes), 1 (regular), and 2 (no trespassing):
    0: Anything Goes
    This option allows any of the operators to create out-of-bounds individuals and these individuals will be included in the population if their fit values are good enough. The boundaries are only important when generating random individuals.
    1: Regular
    This allows operators (particularly those operators which use the derivative based optimizer, BFGS) to go out-of-bounds during the creation of an individual (i.e., out-of-bounds values will often be evaluated). But when the operator has decided on an individual, it must be in bounds to be acceptable.
    2: No Trespassing
    No out-of-bounds evaluations will ever be requested.
solution.tolerance This is the tolerance level used by GENOUD. Numbers within solution.tolerance are considered to be equal. This is particularly important when it comes to evaluating wait.generations and conducting the gradient.check.
BFGS This variable denotes whether or not GENOUD applies a quasi-Newton derivative optimizer (BFGS) to the best individual at the end of each generation after the initial one. Setting BFGS to FALSE does not mean that the BFGS will never be used. In particular, Operator 9 (Local-Minimum Crossover) must also be set to zero.
data.type.int This option sets the data type of the parameters of the function to be optimized. If the variable is TRUE, GENOUD is informed that it is dealing with integer values. Use of the integer data type is supported only as a beta feature. Some of the included operators will not work well with integer type parameters.

With integer parameters, GENOUD never uses derivative information. This implies that the BFGS quasi-Newton optimizer is never used—i.e., the BFGS flag is set to FALSE. It also implies that Operator 9 (Local-Minimum Crossover) is set to zero and that gradient checking (as a convergence criterion) is turned off. No matter what other options have been set to, data.type.int takes precedence—i.e., if GENOUD is told that it is searching over an integer parameter space, gradient information is never considered.

There is no option to mix integer and floating point parameters. If one wants to mix the two, it is suggested that the user pick integer type and in her objective function map a particular integer range into a floating point number range. For example, tell GENOUD to search from 0 to 100 and divide by 100 to obtain a search grid of 0 to 1.0 (by .1).
hessian When this flag is set to TRUE, GENOUD will return the hessian matrix at the solution as part of its return list. A user can use this matrix to calculate standard errors.
unif.seed This is the first of the two random number seeds which GENOUD uses. The default value of this seed is 81282.
int.seed This is the second of the two random number seeds which GENOUD uses. The default value of this seed is 53058.
print.level This variable controls the level of printing that GENOUD does. There are four possible levels: 0 (minimal printing), 1 (normal), 2 (detailed), and 3 (debug). If level 2 is selected, GENOUD will print details about the population at each generation. The print.level variable also significantly affects how much detail is placed in the project file—see project.path. Note that R convention would have us at print level 0 (minimal printing). However, because GENOUD runs may take a long time, it is important for the user to receive feedback. Hence, print level 2 has been set as the default.
share.type If share.type is equal to 1, then GENOUD, at startup, checks to see if there is an existing project file (see project.path). If such a file exists, it initializes its original population using it.

If the project file contains a smaller population than the current GENOUD run, GENOUD will randomly create the necessary individuals. If the project file contains a larger population than the current GENOUD run, GENOUD will kill the necessary individuals using exponential selection.

If the number of variables (see nvars) reported in the project file is different from the current GENOUD run, GENOUD does not use the project file (regardless of the value of share.type) and GENOUD generates the necessary starting population at random.
instance.number This number (starting from 0) denotes the number of recursive instances of GENOUD. GENOUD then sets up its random number generators and other such structures so that the multiple instances do not interfere with each other. It is up to the user to make certain that the different instances of GENOUD are not writing to the same output file(s): see output.path and project.path.

For the R version of GENOUD this variable is of limited use. It is basically there in case a GENOUD run is being used to optimize the result of another GENOUD run (i.e., a recursive implementation).
output.path This is the full (relative) path to where GENOUD's output is to go. If the value of output.path= ``stdout'', then GENOUD's output will go to standard output in UNIX and to the GUI console in Windows. Also see output.append and project.path.
output.append If output is being sent to a file (see output.path), this logical variable tells GENOUD whether it should append to the file if it already exists or if it should overwrite an existing file.
project.path This is the path of the GENOUD project file. The behavior of the project file depends on the print.level chosen. If the print.level variable is set to 1, then the project file is rewritten after each generation. Therefore, only the currently fully completed generation is included in the file. If the print.level variable is set to 2, then each new generation is simply appended to the project file. For all other values of print.level, the project file is not created.
P1 This is the cloning operator. GENOUD always clones the best individual each generation. But this operator clones others as well. Please see the Operators Section for details about operators and how they are weighted.
P2 This is the uniform mutation operator. One parameter of the parent is mutated. Please see the Operators Section for details about operators and how they are weighted.
P3 This is the boundary mutation operator. This operator finds a parent and mutates one of its parameters towards the boundary. Please see the Operators Section for details about operators and how they are weighted.
P4 Non-Uniform Mutation. Please see the Operators Section for details about operators and how they are weighted.
P5 This is the polytope crossover. Please see the Operators Section for details about operators and how they are weighted.
P6 Multiple Point Simple Crossover. Please see the Operators Section for details about operators and how they are weighted.
P7 Whole Non-Uniform Mutation. Please see the Operators Section for details about operators and how they are weighted.
P8 Heuristic Crossover. Please see the Operators Section for details about operators and how they are weighted.
P9 Local-Minimum Crossover: BFGS. This is rather CPU intensive, and should be generally used less than the other operators. Please see the Operators Section for details about operators and how they are weighted.

Value

GENOUD returns a list with 7 objects. 8 objects are returned if the user has requested the hessian to be calculated at the solution. Please see the hessian option. The returned objects are:

value This variable contains the fitness value at the solution.
generations This variables contains the number of generations GENOUD ran for.
peakgeneration This variable contains the generation number at which GENOUD found the solution.
pop.size This variable contains the population size that GENOUD actually used. See pop.size for why this value may differ from the population size the user requested.
par This vector contains the parameter values found at the solution.
gradients This vector contains the gradients found at the solution. If no gradients were calculated, they are reported to be -1.0.
operators This vector reports the actual number of operators (of each type) GENOUD used. Please see the Operators Section for details.
hessian If the user has requested the hessian matrix to be returned (via the hessian flag), the hessian at the solution will be returned. The user may use this matrix to calculate standard errors.

Operators

GENOUD has nine operators that it uses. The integer values which are assigned to each of these operators (P1...P9) are weights. GENOUD calculates the sum of s = P1+P2+...+P9. Each operator is assigned a weight equal to W_n = s/(P_n). The number of times an operator is called usually equals c_n = W_n * pop.size.

Operators 6 (Multiple Point Simple Crossover) and 8 (Heuristic Crossover) require an even number of individuals to work on—i.e., they require two parents. Therefore, the pop.size variable and the operators sets must be such that these three operators have an even number of individuals to work with. If this does not occur, GENOUD automatically upwardly adjusts the population size to make this constraint hold.

Strong uniqueness checks have been built into the operators to help ensure that the operators produce offspring different from their parents, but this does not always happen.

Note that GENOUD always keeps the best individual each generation.

GENOUD's 9 operators are:

  1. Cloning
  2. Uniform Mutation
  3. Boundary Mutation
  4. Non-Uniform Crossover
  5. Polytope Crossover
  6. Multiple Point Simple Crossover
  7. Whole Non-Uniform Mutation
  8. Heuristic Crossover
  9. Local-Minimum Crossover: BFGS

For more information please see Table 1 of the reference article: http://jsekhon.fas.harvard.edu/genoud/node7.shtml.

Note

The most important options affecting performance are those determining population size (pop.size) and the number of generations the algorithm runs (max.generations, wait.generations, hard.generation.limit and gradient.check). Search performance is expected to improve as the population size and the number of generations the program runs increase. These and the other options should be adjusted for the problem at hand. Please pay particular attentions to the search domains (Domains and default.domains). For more information please see the reference article.

Linear and nonlinear constraints among the parameters can be introduced by users in their fit function. For example, if the sum of parameters 1 and 2 must be less than 725, the following can be placed in the fit function the user is going to have GENOUD maximize: if ( (parm1 + parm2) >= 725) { return(-99999999) }. In this example, a very bad fit value is returned to GENOUD if the linear constrain is violated. GENOUD will then attempt to find parameter values that satisfy the constraint.

Author(s)

Walter R. Mebane, Jr., Cornell University, wrm1@cornell.edu, http://macht.arts.cornell.edu/wrm1/

Jasjeet S. Sekhon, Harvard University, jasjeet_sekhon@harvard.edu, http://jsekhon.fas.harvard.edu/

References

Sekhon, Jasjeet Singh and Walter R. Mebane, Jr. 1998. ``Genetic Optimization Using Derivatives: Theory and Application to Nonlinear Models.'' Political Analysis, 7: 187-210. http://jsekhon.fas.harvard.edu/genoud/genoud.pdf

Mebane, Walter R., Jr. and Jasjeet S. Sekhon. 2004. ``Robust Estimation and Outlier Detection for Overdispersed Multinomial Models of Count Data.'' American Journal of Political Science, 48 (April): 391-410. http://elections.fas.harvard.edu/election2000/MebaneSekhon.multinom.pdf

See Also

optim.

Examples

#maximize the sin function
 sin1 <- genoud(sin, nvars=1, max=TRUE);

#minimize the sin function
 sin2 <- genoud(sin, nvars=1, max=FALSE);

#maximize a univariate normal mixture which looks like a claw and
#plot it
claw <- function(xx) {
  Nd <- function(x, mu, sigma) {
    w <- (1.0/sqrt(2.0*pi*sigma*sigma)) ;
    z <- (x-mu)/sigma;
    w <- w*exp(-0.5*z*z) ;
    as.double(w);
  }
  x <- xx[1];
  y <- (0.46*(Nd(x,-1.0,2.0/3.0) + Nd(x,1.0,2.0/3.0)) +
           (1.0/300.0)*(Nd(x,-0.5,.01) + Nd(x,-1.0,.01) + Nd(x,-1.5,.01)) +
           (7.0/300.0)*(Nd(x,0.5,.07) + Nd(x,1.0,.07) + Nd(x,1.5,.07))) ;
  as.double(y);
}
 claw1   <- genoud(claw, nvars=1,P9=100,max=TRUE);
 xx <- seq(-3,3,.05);
 plot(xx,lapply(xx,claw),type="l",xlab="Parameter",ylab="Fit",main="RGENOUD: Maximize the Claw Density");
 points(claw1$par,claw1$value,col="red");

#maximize a bivariate normal mixture which looks like a claw
biclaw <- function(xx) {
  mNd2 <- function(x1, x2, mu1, mu2, sigma1, sigma2, rho)
    {
      z1 <- (x1-mu1)/sigma1;
      z2 <- (x2-mu2)/sigma2;
      w <- (1.0/(2.0*pi*sigma1*sigma2*sqrt(1-rho*rho))) ;
      w <- w*exp(-0.5*(z1*z1 - 2*rho*z1*z2 + z2*z2)/(1-rho*rho)) ;
      as.double(w);
    }
  x1 <- xx[1]+1;
  x2 <- xx[2]+1;
  
  y <- (0.5*mNd2(x1,x2,0.0,0.0,1.0,1.0,0.0) +
            0.1*(mNd2(x1,x2,-1.0,-1.0,0.1,0.1,0.0) +
                 mNd2(x1,x2,-0.5,-0.5,0.1,0.1,0.0) +
                 mNd2(x1,x2,0.0,0.0,0.1,0.1,0.0) +
                 mNd2(x1,x2,0.5,0.5,0.1,0.1,0.0) +
                 mNd2(x1,x2,1.0,1.0,0.1,0.1,0.0)));

  as.double(y);
}
 biclaw1 <- genoud(biclaw, nvars=2,P9=100,max=TRUE);

[Package Contents]