ergmm {latentnetHRT}R Documentation

Fit a Latent Space Random Graph Model

Description

ergmm is used to fit latent space and latent space cluster random network models, as described in Hoff, Raftery and Handcock (2002) and Handcock, Raftery and Tantrum (2005). ergmm produces likelihood-based inference. Approximate maximum likelihood estimators are computed, and Bayesian inference is implemented via a MCMC algorithm.

Usage

ergmm(formula, theta0=NULL, 
     burnin=1000, MCMCsamplesize=1000, interval=10,
     latent.control=list(maxit=40,penalty.sigma=c(10,0.5),MLEonly=FALSE),
     returnMCMCstats=TRUE, randseed=NULL, 
     verbose=FALSE, ...)

Arguments

formula An R formula object, of the form y ~ <term 1> + <term 2> ..., where y is a network object or a matrix that can be coerced to a network object, and <term 1>, <term 2>, etc, are each terms chosen from the list given below. To create a network object in R, use the network function. For a description of the possible terms see the terms.ergmm.
theta0 The initial parameter value used to find the MLE. The default is based on multidimensional scaling fit to the positions.
burnin The number of proposals before any MCMC sampling is done.
MCMCsamplesize The number of posterior samples to draw.
interval The number of proposal steps between sampled statistics.
latent.control Control variables for the latent space algorithm. This are used only if a latent term is included in the model. maxit sets the maximum number of iterations to use in the Quasi-Newton-Raphson algorithm to maximize the MCMC likelihood. MLEonly is a logical flag set to compute only the MLE estimates and not Bayesian inference based on the MCMC algorithm. penalty.sigma is the penalty on the norm of the latent distances to use in the penalized log-likelihood. The multiplier is 1/(penalty.sigma[1]^2 so that smaller values offer greater penalties. The second component is the multiplier on the component effects. Values less than 1 reduce the repulsion between components. This is used in the MLE of positions only and not the MCMC log-likelihood. It can be interpreted as a surrogate for a prior distribution and expresses the belief that the latent distances are not too large on the log-odds scale.
returnMCMCstats If this is TRUE the matrix of change statistics from the MCMC run is returned as component sample. This matrix is actually an object of class mcmc and can be used directly in the CODA package to assess MCMC convergence.
randseed Random number integer seed. The default is sample(10000000, size=1).
verbose If this is TRUE, we will print out more information as we run the program, including (currently) some goodness of fit statistics.
... Additional arguments, to be passed to lower-level functions in the future.

Value

ergmm returns an object of class ergmm that is a list. Fits including a latentcluster term will have at least the following components and fits including a latent term will have at least the components up to and including network.

coef The maximum likelihood estimate the p vector of coefficients for the model parameters (excluding the latent positions and cluster parameters). By default this is just the intercept with p=1.
coef.names A p vector of the coefficient names.
Beta The MCMCsamplesizetimes p matrix of coefficients for the model parameters corresponding to each of the posterior samples. By default this is the intercept only.
Z The MCMCsamplesizetimes k matrix of (Procrustified) posterior positions, where MCMCsamplesize is the sample size and k is the number of dimensions of the latent space.
Z.mkl The network.size(g)times k matrix of minimum Kullback-Leibler positions for each of the nodes.
Z.pmean The network.size(g)times k matrix of posterior mean positions for each of the nodes.
Z.pmode The network.size(g)times k matrix of posterior modal positions for each of the nodes.
Z.mle The network.size(g)times k matrix of MLE positions for each of the nodes.
beta.mkl The p vector of coefficients for the model parameters based on the minimum Kullback-Leibler positions for each of the nodes.
samplesize The number of MCMC samples drawn from the posterior.
sample The MCMCsamplesizetimes (p+2+k) matrix of network statistics, where MCMCsamplesize is the sample size and p is the number of network covariates specified in the model via the latentcov terms (usually 0). The columns are: ``mcmc.loglikelihood", the log-likelihood value; ``density", the constant term in the latent model; the p covariates; ``Z 1", ``Z 2", ..., ``Z k", the k dimensional positions of the first node. The values are recorded for each sample drawn. This is primarily used for MCMC diagnostics to assess convergence.
iterations The number of Newton-Raphson iterations required before convergence.
interval The number of proposals between sampled statistics.
null.deviance The deviance for the null model, comparable with -2 loglikelihood. The null model will include the intercept if there is one in the model, but not the latent variables or latent clusters.
mcmc.loglikelihood The log-likelihood values corresponding to each of the posterior samples.
loglikelihood The log-likelihood for the MLE of positions (and based on the final fits to the other parameters).
mle.lik The log-likelihood for the initial MLE fit of positions.
hessian The Hessian matrix of the approximated loglikelihood function, evaluated at the maximizer. This matrix may be inverted to give an approximate covariance matrix for the MLE of the parameters.
formula The original formula entered into the ergmm function.
latent A flag to indicate that this is a fit of latent variable model. This is always TRUE for ergmm fits and is included for consistency with the statnet package.
cluster A flag to indicate that this is a fit of a latent cluster model. This is always TRUE for ergmm fits if a latentcluster term is in the model and is included for consistency with the statnet package.
network The modeled network as an network object.
BIC A Bayesian Information Criterion approximation for the model. This is the approximation based on the fully Bayesian estimation method in Section 3.2 of Handcock, Raftery and Tantrum (2005). The formula for the approximation is given at the end of Section 4 in that paper. See the references for details.
class The vector of posterior modal classes for each node.
Ki The MCMCsamplesizetimesnetwork.size(g) matrix of posterior draws of the classes, where MCMCsamplesize is the sample size and network.size(g) is the number of nodes in the network.
Ki.mle The network.size(g) vector of maximum likelihood classes for each node.
logl.lr The log-likelihood for the latent space component of the model.
logl.mbc The log-likelihood for the model-based clustering component of the model.
mu The ngroupstimesktimesMCMCsamplesize array of posterior draws of the mean positions of the class, where MCMCsamplesize is the sample size and ngroups is the number of classes.
mu.mle The ngroupstimesk matrix of maximum likelihood mean positions for each class.
ngroups The number of classes or clusters.
qig The network.size(g)timesngroups matrix of posterior probabilities of class membership for each of the nodes.
Sigma The MCMCsamplesizetimesngroups array of posterior draws of the variances of the positions of the class, where MCMCsamplesize is the sample size and ngroups is the number of classes.
Sigma.mle The maximum likelihood variances of the positions for each class.

Note that we have written a function, summary.ergmm that returns a summary of the relevant parts of the ergmm object in concise summary format.

References

Peter D. Hoff, Adrian E. Raftery and Mark S. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, Dec 2002, Vol.97, Iss. 460; pg. 1090-1098.

Mark S. Handcock, Adrian E. Raftery and Jeremy Tantrum. Model-Based Clustering for Social Networks. Working Paper Number 46, Center for Statistics and the Social Sciences, University of Washington, April 2005.

See Also

network, set.vertex.attributes, set.network.attributes, summary.ergmm

Examples

#
# See http://statnetproject.org/latentnetHRT
# for more examples
#
# For an explanation and examples of creating 'network' objects
# see the required 'network' package.
#
# Use 'data(package = "latentnetHRT")' to list the data sets in a
#
data(package="latentnetHRT")
#
# Using Sampson's Monk data, lets fit a 
# simple latent position model
#
data(sampson)
#
# Get the group labels
#
group <- get.vertex.attribute(samplike,"group")
samp.labs <- substr(group,1,1)
#
samp.fit <- ergmm(samplike ~ latent(k=2), burnin=10000,
                 MCMCsamplesize=2000, interval=30)
#
# See if we have convergence in the MCMC
mcmc.diagnostics(samp.fit)
#
# Plot the fit
#
plot(samp.fit,label=samp.labs, vertex.col="group")
#
# Using Sampson's Monk data, lets fit a latent clustering model
#
## Not run: 
samp.fit <- ergmm(samplike ~ latentcluster(k=2, ngroups=3), burnin=10000,
                 MCMCsamplesize=2000, interval=30)
#
# See if we have convergence in the MCMC
mcmc.diagnostics(samp.fit)
#
# Lets look at the goodness of fit:
#
plot(samp.fit,label=samp.labs, vertex.col="group")
plot(samp.fit,pie=TRUE,label=samp.labs)
plot(samp.fit,density=c(2,2))
plot(samp.fit,contours=5,contour.color="red")
plot(samp.fit,density=TRUE,drawarrows=TRUE)
#
# Add contours
#
ergmm.add.contours(samp.fit,nlevels=8,lwd=2)
points(samp.fit$Z.mkl,pch=19,col=samp.fit$class)
#
# Try a covariate on the group
#
samegroup <- outer(group, group, "==")
diag(samegroup) <- 0
samp.fit <- ergmm(samplike ~ latentcov(samegroup) + latent(k=2))
summary(samp.fit)
## End(Not run)

[Package latentnetHRT version 0.7-18 Index]