spg {BB} | R Documentation |
Spectral projected gradient method for large-scale optimization with simple constraints.
spg(par, fn, gr=NULL, method=3, project=NULL, lower=-Inf, upper=Inf, control=list(), ...)
par |
A real vector argument to fn , indicating the initial guess for the optimization of nonlinear
objective function fn . |
fn |
Nonlinear objective function that is to be optimized. A scalar function that takes a real vector as argument and returns a scalar that is the value of the function at that point (see details). |
gr |
The gradient of the objective function fn evaluated at the
argument. This is a vector-function that takes a real
vector as argument and returns a real vector of the same length.
It defaults to "NULL", which means that gradient is evaluated numerically.
Computations are dramatically faster in high-dimensional problems when
the exact gradient is provided. See *Example*. |
method |
An integer (1, 2, or 3) specifying which Barzilai-Borwein steplength to use. The default is 3. See *Details*. |
project |
The projection function that takes a point in R^n and
projects it onto a region that defines the constraints of the problem.
This is a vector-function that takes a real vector as argument and
returns a real vector of the same length. See *Details*.
It defaults to "NULL" which implies a box constraint defined by
lower and upper . |
lower |
A vector that defines lower bounds on par . Defaults to -Inf |
upper |
A vector that defines upper bounds on par . Defaults to Inf |
control |
A list of control parameters. See *Details*. |
... |
Additional arguments passed to fn , gr and
project . (All must accept any specified arguments, either
explicitly or by having a ... argument, but
they do not need to use them all.) |
R adaptation, with significant modifications, by Ravi Varadhan, Johns Hopkins University (March 25, 2008), from the original FORTRAN code of Birgin, Martinez, and Raydan (2001).
A major modification in our R adaptation of the original FORTRAN code is the
availability of 3 different options for Barzilai-Borwein (BB)
steplengths: method = 1
is the BB
steplength used in Birgin, Martinez and Raydan (2000); method = 2
is
the other steplength proposed in Barzilai and Borwein's (1988) original paper.
Finally, method = 3
, is a new steplength, which was first proposed in
Varadhan and Roland (2008) for accelerating the EM algorithm.
In fact, Varadhan and Roland (2008) considered 3 similar steplength schemes in
their EM acceleration work. Here, we have chosen method = 3
as the "default" method. This method may be slightly slower than the
other 2 BB steplength schemes, but it generally exhibited more reliable
convergence to a better optimum in our experiments.
We recommend that the user try the other steplength schemes if the default
method does not perform well in their problem.
Argument project
defaults to "NULL", which results in a default box
projection using constraints defined by vectors lower
and upper
.
These default to -Inf
and +Inf
respectively, but the user
can specify lower
and upper
arguments to enforce box constraints.
The user can also provide a project
function to enforce more
complicated linear/nonlinear equality/inequality constraints.
The arguments lower
, upper
, and ... are passed to project
so they must all be accepted by it, either explicitly or by having a ...
argument, but it does not need to use them all. So, for example,
proj.good <- function(x, lower, upper){whatever}
or
proj.good <- function(x, ...){whatever}
, but not
proj.bad <- function(x) {whatever}
Argument control
is a list specifing any changes to default values of
algorithm control parameters. Note that the names of these must be
specified completely. Partial matching will not work.
The list items are as follows:
M=1
would enforce strict monotonicity
in the reduction of L2-norm of fn
, whereas larger values allow for more non-monotonicity. Global convergence under non-monotonicity is ensured by
enforcing the Grippo-Lampariello-Lucidi condition (Grippo et al. 1986) in a non-monotone line-search algorithm. Values of M
between 5 to 20 are generally good. The default is M = 10
.maxit = 1500
.gr
evaluated at the current parameter.
Convergence is declared when the infinity-norm of projected gradient is less
than gtol
. Default is gtol = 1.e-05
.maxfeval = 10000
.maximize = FALSE
indicating
minimization. For maximization (e.g. log-likelihood maximization in statistical modeling), this may be set to TRUE
.TRUE
, information on the progress of optimization is printed. Default is trace = TRUE
.trace=TRUE
. Default is triter=10
, which means that
the objective fn
and the infinity-norm of its projected gradient are
printed at every 10-th iteration.A list with the following components:
par |
Parameters that optimize the nonlinear objective function, if convergence is successful. |
value |
The value of the objective function at termination. |
gradient |
L-infinity norm of the projected gradient of the objective function at termination. If convergence is successful, this should be less than gtol . |
fn.reduction |
Reduction in the value of the function from its initial value. This is negative in maximization. |
iter |
Number of iterations taken by the algorithm. |
feval |
Number of times the objective fn was evaluated. |
convergence |
An integer code indicating type of convergence. 0
indicates successful convergence, in which case the resid is smaller
than tol . Error codes are: 1 indicates that the maximum limit
for iterations maxit has been reached. 2 indicates that
maximum limit on function evals has been exceeded. 3 indicates
failure due to error in function evaluation. 4 indicates failure due
to error in gradient evaluation. 5 indicates failure due to error in
projection. |
message |
A text message explaining which termination criterion was used. |
Birgin EG, Martinez JM, and Raydan M (2000): Nonmonotone spectral projected gradient methods on convex sets, SIAM J Optimization, 10, 1196-1211.
Birgin EG, Martinez JM, and Raydan M (2001): SPG: software for convex-constrained optimization, ACM Transactions on Mathematical Software.
L Grippo, F Lampariello, and S Lucidi (1986), A nonmonotone line search technique for Newton's method, SIAM J on Numerical Analysis, 23, 707-716.
M Raydan (1997), Barzilai-Borwein gradient method for large-scale unconstrained minimization problem, SIAM J of Optimization, 7, 26-33.
R Varadhan and C Roland (2008), Simple and globally-convergent methods for accelerating the convergence of any EM algorithm, Scandinavian J Statistics, doi: 10.1111/j.1467-9469.2007.00585.x.
R Varadhan and PD Gilbert (2008), BB: An R package of Barzilai-Borwein spectral methods for solving and optimizing large-scale nonlinear systems, Unpublished.
optim
,
nlm
,
sane
,
dfsane
,
grad
sc2.f <- function(x){ sum((1:length(x)) * (exp(x) - x)) / 10} sc2.g <- function(x){ (1:length(x)) * (exp(x) - 1) / 10} p0 <- rnorm(200) ans.spg1 <- spg(par=p0, fn=sc2.f) # Default is method=3 ans.spg2 <- spg(par=p0, fn=sc2.f, method=1) ans.spg3 <- spg(par=p0, fn=sc2.f, method=2) ans.cg <- optim(par=p0, fn=sc2.f, method="CG") #Uses conjugate-gradient method in "optim" ans.lbfgs <- optim(par=p0, fn=sc2.f, method="L-BFGS-B") #Uses low-memory BFGS method in "optim" # Now we use exact gradient. # Computation is much faster compared to when using numerical gradient. ans.spg1 <- spg(par=p0, fn=sc2.f, gr=sc2.g) ############ # Another example illustrating use of additional parameters to objective function valley.f <- function(x, cons) { n <- length(x) f <- rep(NA, n) j <- 3 * (1:(n/3)) jm2 <- j - 2 jm1 <- j - 1 f[jm2] <- (cons[2] * x[jm2]^3 + cons[1] * x[jm2]) * exp(-(x[jm2]^2)/100) - 1 f[jm1] <- 10 * (sin(x[jm2]) - x[jm1]) f[j] <- 10 * (cos(x[jm2]) - x[j]) sum(f*f) } k <- c(1.003344481605351, -3.344481605351171e-03) p0 <- rnorm(300) # number of parameters should be a multiple of 3 for this example ans.spg2 <- spg(par=p0, fn=valley.f, cons=k, method=2) ans.cg <- optim(par=p0, fn=valley.f, cons=k, method="CG") ans.lbfgs <- optim(par=p0, fn=valley.f, cons=k, method="L-BFGS-B") #################################################################### # Here is a statistical example illustrating log-likelihood maximization. poissmix.loglik <- function(p,y) { # Log-likelihood for a binary Poisson mixture i <- 0:(length(y)-1) loglik <- y*log(p[1]*exp(-p[2])*p[2]^i/exp(lgamma(i+1)) + (1 - p[1])*exp(-p[3])*p[3]^i/exp(lgamma(i+1))) return (sum(loglik) ) } # Data from Hasselblad (JASA 1969) poissmix.dat <- data.frame(death=0:9, freq=c(162,267,271,185,111,61,27,8,3,1)) y <- poissmix.dat$freq # Lower and upper bounds on parameters lo <- c(0.001,0,0) hi <- c(0.999, Inf, Inf) p0 <- runif(3,c(0.2,1,1),c(0.8,5,8)) # randomly generated starting values ans.spg <- spg(par=p0, fn=poissmix.loglik, y=y, lower=lo, upper=hi, control=list(maximize=TRUE))