AdMit {AdMit} | R Documentation |
Function which performs the fitting of an adaptive mixture of Student-t distributions to approximate a target density through its kernel function
AdMit(KERNEL, mu0, Sigma0 = NULL, control = list(), ...)
KERNEL |
kernel function of the target density on which the adaptive mixture is fitted. This
function should be vectorized for speed purposes (i.e., its first
argument should be a matrix and its output a vector). Moreover, the function must contain
the logical argument log . If log = TRUE , the function
returns (natural) logarithm values of the kernel function. NA and
NaN values are not allowed. (See *Details* for examples
of KERNEL implementation.) |
mu0 |
initial value in the first stage optimization (for the location of
the first Student-t component) in the adaptive mixture, or
location of the first Student-t component if Sigma0 is not NULL . |
Sigma0 |
scale matrix of the first Student-t component (square, symmetric and positive definite). Default:
Sigma0 = NULL , i.e., the scale matrix of the first Student-t
component is estimated by the function AdMit . |
control |
control parameters (see *Details*). |
... |
further arguments to be passed to KERNEL . |
The argument KERNEL
is the kernel function of the target
density, and it should be vectorized for speed purposes.
As a first example, consider the kernel function proposed by Gelman-Meng (1991):
k(x1,x2) = exp( 0.5*[A*x1^2*x2^2 + x1^2 + x2^2 - 2*B*x1*x2 - 2*C1*x1 - 2*C2*x2] )
where commonly used values are A=1, B=0, C1=3 and C2=3.
A vectorized implementation of this function might be:
GelmanMeng <- function(x, A = 1, B = 0, C1 = 3, C2 = 3, log = TRUE) { if (is.vector(x)) x <- matrix(x, nrow = 1) r <- -.5 * (A * x[,1]^2 * x[,2]^2 + x[,1]^2 + x[,2]^2 - 2 * B * x[,1] * x[,2] - 2 * C1 * x[,1] - 2 * C2 * x[,2]) if (!log) r <- exp(r) as.vector(r) }This way, we may supply a point (x1,x2) for
x
and the function will output a single value (i.e., the kernel
estimated at this point). But the function is vectorized, in the sense
that we may supply a Nx2 matrix
of values for x
, where rows of x
are
points (x1,x2) and the output will be a vector of
length N, containing the kernel values for these points.
Since the AdMit
procedure evaluates KERNEL
for a
large number of points, a vectorized implementation is important. Note
also the additional argument log = TRUE
which is used for
numerical stability.
As a second example, consider the following (simple) econometric model:
y_t ~ i.i.d. N(mu,sigma^2) t=1,...,T
where mu is the mean value and sigma is the standard deviation. Our purpose is to estimate theta=(mu,sigma) within a Bayesian framework, based on a vector y of T observations; the kernel thus consists of the product of the prior and the likelihood function. As previously mentioned, the kernel function should be vectorized, i.e., treat a (Nx2) matrix of points theta for which the kernel will be evaluated. Using the common (Jeffreys) prior p(theta)=1/sigma for sigma>0, a vectorized implementation of the kernel function might be:
KERNEL <- function(theta, y, log = TRUE) { if (is.vector(theta)) theta <- matrix(theta, nrow = 1) ## sub function which returns the log-kernel for a given ## thetai value (i.e., a given row of theta) KERNEL_sub <- function(thetai) { if (thetai[2] > 0) ## check if sigma>0 { ## if yes, compute the log-kernel at thetai r <- - log(thetai[2]) + sum(dnorm(y, thetai[1], thetai[2], TRUE)) } else { ## if no, returns -Infinity r <- -Inf } as.numeric(r) } ## 'apply' on the rows of theta (faster than a for loop) r <- apply(theta, 1, KERNEL_sub) if (!log) r <- exp(r) as.numeric(r) }
Since this kernel function also depends on the vector y, it
must be passed to KERNEL
in the AdMit
function. This is
achieved via the argument ..., i.e., AdMit(KERNEL, mu = c(0, 1), y = y)
.
To gain even more speed, implementation of KERNEL
might rely on C or Fortran
code using the functions .C
and .Fortran
. An example is
provided in the file ‘AdMitJSS.R’ in the package's folder.
The argument control
is a list that can supply any of
the following components:
Ns
Ns = 1e5
.Np
Ns
). Default: Np = 1e3
.Hmax
Hmax = 10
.df
df = 1
.CVtol
CVtol
. Default: CVtol = 0.1
, i.e., 10%.weightNC
weightNC = 0.1
, i.e., 10%.trace
trace = FALSE
, i.e., no tracing information.IS
IS = FALSE
,
i.e., use numerical optimization instead.ISpercent
ISpercent = c(0.05, 0.15, 0.3)
, i.e., 5%, 15% and 30%.ISscale
ISscale = c(1, 0.25, 4)
.trace.mu
optim
for further details).
Default: trace.mu = 0
, i.e., no tracing information.maxit.mu
maxit.mu = 500
.reltol.mu
reltol.mu = 1e-8
.trace.p
, maxit.p
, reltol.p
A list with the following components:
CV
: vector (of length H) of coefficients of variation of
the importance sampling weights.
mit
: list (of length 4) containing information on the fitted mixture of
Student-t distributions, with the following components:
p
: vector (of length H) of mixing probabilities.
mu
: matrix (of size Hxd) containing the
vectors of modes (in row) of the mixture components.
Sigma
: matrix (of size Hxd*d) containing the scale
matrices (in row) of the mixture components.
df
: degrees of freedom parameter of the Student-t components.
where H (>=1) is the number of components in the adaptive
mixture of Student-t distributions and d (>=1) is
the dimension of the first argument in KERNEL
.
summary
: data frame containing information on the optimization
procedures. It returns for each component of the adaptive mixture of
Student-t distribution: 1. the method used to estimate the mode
and the scale matrix of the Student-t component (`USER' if Sigma0
is
provided by the user; numerical optimization: `BFGS', `Nelder-Mead';
importance sampling: `IS', with percentage(s) of importance weights
used and scaling factor(s)); 2. the time required for this optimization;
3. the method used to estimate the mixing probabilities
(`NLMINB', `BFGS', `Nelder-Mead', `NONE'); 4. the time required for this
optimization; 5. the coefficient of variation of the importance
sampling weights.
Further details and examples of the R package AdMit
can be found in Ardia, Hoogerheide, van Dijk (2008, 2009). See also
the package vignette by typing vignette("AdMit")
and the
files ‘AdMitJSS.txt’ and ‘AdMitRnews.txt’ in the ‘/doc’ package's folder.
Further details on the core algorithm are given in Hoogerheide (2006), Hoogerheide, Kaashoek, van Dijk (2007) and Hoogerheide, van Dijk (2008).
The adaptive mixture mit
returned by the function AdMit
is used by the
function AdMitIS
to perform importance sampling using
mit
as the importance density or by the function AdMitMH
to perform
independence chain Metropolis-Hastings sampling using mit
as the
candidate density.
Please cite the package in publications. Use citation("AdMit")
.
David Ardia <david.ardia@unifr.ch> for the R port,
Lennart F. Hoogerheide and Herman K. van Dijk for the AdMit
algorithm.
Ardia, D., Hoogerheide, L.F., van Dijk, H.K. (2008). The AdMit Package. Econometric Institute report 2008-17. http://publishing.eur.nl/ir/repub/asset/13053/EI2008-17.pdf (forthcoming in Rnews)
Ardia, D., Hoogerheide, L.F., van Dijk, H.K. (2009). Adaptive Mixture of Student-t Distributions as a Flexible Candidate Distribution for Efficient Simulation: The R Package AdMit. Journal of Statistical Software 29(3). http://www.jstatsoft.org/v29/i03/
Gelman, A., Meng, X.-L. (1991). A Note on Bivariate Distributions That Are Conditionally Normal. The American Statistician 45(2), pp.125–126.
Hoogerheide, L.F. (2006). Essays on Neural Network Sampling Methods and Instrumental Variables. PhD thesis, Tinbergen Institute, Erasmus University Rotterdam (NL). ISBN: 9051708261. (Book nr. 379 of the Tinbergen Institute Research Series.)
Hoogerheide, L.F., Kaashoek, J.F., van Dijk, H.K. (2007). On the Shape of Posterior Densities and Credible Sets in Instrumental Variable Regression Models with Reduced Rank: An Application of Flexible Sampling Methods using Neural Networks. Journal of Econometrics 139(1), pp.154–180. doi: 10.1016/j.jeconom.2006.06.009.
Hoogerheide, L.F., van Dijk, H.K. (2008). Possibly Ill-Behaved Posteriors in Econometric Models: On the Connection between Model Structures, Non-elliptical Credible Sets and Neural Network Simulation Techniques. Tinbergen Institute discussion paper 2008-036/4. http://www.tinbergen.nl/discussionpapers/08036.pdf
AdMitIS
for importance sampling using an
adaptive mixture of Student-t distributions as the importance density,
AdMitMH
for the independence chain Metropolis-Hastings
algorithm using an adaptive mixture of Student-t distributions as
the candidate density.
## Gelman and Meng (1991) kernel function GelmanMeng <- function(x, A = 1, B = 0, C1 = 3, C2 = 3, log = TRUE) { if (is.vector(x)) x <- matrix(x, nrow = 1) r <- -.5 * (A * x[,1]^2 * x[,2]^2 + x[,1]^2 + x[,2]^2 - 2 * B * x[,1] * x[,2] - 2 * C1 * x[,1] - 2 * C2 * x[,2]) if (!log) r <- exp(r) as.vector(r) } ## Run AdMit (with default values) set.seed(1234) outAdMit <- AdMit(GelmanMeng, mu0 = c(0.0, 0.1)) print(outAdMit) ## Run AdMit (using importance sampling to estimate ## the modes and the scale matrices) set.seed(1234) outAdMit <- AdMit(KERNEL = GelmanMeng, mu0 = c(0.0, 0.1), control = list(IS = TRUE)) print(outAdMit) ## Simple econometric model: y_t ~ i.i.d. N(mu,sigma^2) ## Jeffreys prior p(theta) prop 1/sigma for sigma > 0 KERNEL <- function(theta, y, log = TRUE) { if (is.vector(theta)) theta <- matrix(theta, nrow = 1) KERNEL_sub <- function(thetai) { if (thetai[2] > 0) { r <- - log(thetai[2]) + sum(dnorm(y, thetai[1], thetai[2], TRUE)) } else { r <- -Inf } as.numeric(r) } r <- apply(theta, 1, KERNEL_sub) if (!log) r <- exp(r) as.numeric(r) } ## Generate 20 draws for mu = 1 and sigma = 0.5 set.seed(1234) y <- rnorm(20, 2, 0.5) ## Run AdMit (with default values); pass the vector y ## of observations using the ... argument of AdMit and ## print steps of the fitting process outAdMit <- AdMit(KERNEL = KERNEL, mu0 = c(1.0, 1.0), y = y, control = list(trace = TRUE)) print(outAdMit)