msm {msm}R Documentation

Multi-state Markov and hidden Markov models in continuous time

Description

Fit a continuous-time Markov or hidden Markov multi-state model by maximum likelihood. Observations of the process can be made at arbitrary times, or the exact times of transition between states can be known. Covariates can be fitted to the Markov chain transition intensities or to the hidden Markov observation process.

Usage

msm ( formula, subject=NULL, data = list(), qmatrix, gen.inits = FALSE,
      ematrix=NULL, hmodel=NULL, obstype=NULL,
      covariates = NULL, covinits = NULL, constraint = NULL,
      misccovariates = NULL, misccovinits = NULL, miscconstraint = NULL,
      hcovariates = NULL, hcovinits = NULL, hconstraint = NULL,
      qconstraint=NULL, econstraint=NULL, initprobs = NULL, est.initprobs=FALSE, 
      death = FALSE, exacttimes = FALSE, censor=NULL,
      censor.states=NULL, cl = 0.95, fixedpars = NULL, center=TRUE,
      opt.method=c("optim","nlm"), hessian=TRUE, use.deriv=FALSE,
      deriv.test=FALSE, analyticp=TRUE, ... )

Arguments

formula A formula giving the vectors containing the observed states and the corresponding observation times. For example,
state ~ time
Observed states should be in the set 1, ..., n, where n is the number of states.
subject Vector of subject identification numbers for the data specified by formula. If missing, then all observations are assumed to be on the same subject. These must be sorted so that all observations on the same subject are adjacent.
data Optional data frame in which to interpret the variables supplied in formula, subject, covariates, misccovariates, hcovariates and obstype.
qmatrix Initial transition intensity matrix of the Markov chain. If an instantaneous transition is not allowed from state r to state s, then qmatrix should have (r,s) entry 0, otherwise it should be non-zero. Any diagonal entry of qmatrix is ignored, as it is constrained to be equal to minus the sum of the rest of the row. For example,

rbind( c( 0, 0.1, 0.01 ), c( 0.1, 0, 0.2 ), c( 0, 0, 0 ) )

represents a 'health - disease - death' model, with transition intensities 0.1 from health to disease, 0.01 from health to death, 0.1 from disease to health, and 0.2 from disease to death. The initial intensities given here are with any covariates set to their means in the data (or set to zero, if center = FALSE).
gen.inits If TRUE, then initial values for the transition intensities are estimated by assuming that the data represent the exact transition times of the process. The non-zero entries of the supplied qmatrix are assumed to indicate the allowed transitions of the model.
ematrix If misclassification between states is to be modelled, this should be a matrix of initial values for the misclassification probabilities. The rows represent underlying states, and the columns represent observed states. If an observation of state s is not possible when the subject occupies underlying state r, then ematrix should have (r,s) entry 0. Otherwise ematrix should have (r,s) entry corresponding to the probability of observing s conditionally on occupying true state r. The diagonal of ematrix is ignored, as rows are constrained to sum to 1. For example,

rbind( c( 0, 0.1, 0 ), c( 0.1, 0, 0.1 ), c( 0, 0.1, 0 ) )

represents a model in which misclassifications are only permitted between adjacent states.
For an alternative way of specifying misclassification models, see hmodel.
hmodel Specification of the hidden Markov model. This should be a list of return values from the constructor functions described in the hmm-dists help page. Each element of the list corresponds to the outcome model conditionally on the corresponding underlying state.
For example, consider a three-state hidden Markov model. Suppose the observations in underlying state 1 are generated from a Normal distribution with mean 100 and standard deviation 16, while observations in underlying state 2 are Normal with mean 54 and standard deviation 18. Observations in state 3, representing death, are exactly observed, and coded as 999 in the data. This model is specified as
hmodel = list(hmmNorm(mean=100, sd=16), hmmNorm(mean=54, sd=18), hmmIdent(999))
The mean and standard deviation parameters are estimated starting from these initial values. See the hmm-dists help page for details of the constructor functions for each available distribution.
A misclassification model, that is, a hidden Markov model where the outcomes are misclassified observations of the underlying states, can either be specified using a list of hmmCat objects, or by using an ematrix as in previous versions of msm.
For example,

ematrix = rbind( c( 0, 0.1, 0, 0 ), c( 0.1, 0, 0.1, 0 ), c( 0, 0.1, 0, 0), c( 0, 0, 0, 0) )

is equivalent to

hmodel = list( hmmCat(prob=c(0.9, 0.1, 0, 0)), hmmCat(prob=c(0.1, 0.8, 0.1, 0)), hmmCat(prob=c(0, 0.1, 0.9, 0)), hmmIdent())

obstype A vector specifying the observation scheme for each row of the data. This can be included in the data frame data along with the state, time, subject IDs and covariates. Its elements should be either 1, 2 or 3, meaning as follows:
describe {
1
An observation of the process at an arbitrary time (a "snapshot" of the process)
2
An exact transition time, with the state at the previous observation retained until the current observation.
3
An exact transition time, but the state at the instant before entering this state is unknown. A common example is death times in studies of chronic diseases.
} If obstype is not specified, this defaults to all 1. If obstype is a single number, all observations are assumed to be of this type.
This is a generalisation of the death and exacttimes arguments to allow different schemes per observation.
exacttimes=TRUE specifies that all observations are of obstype 2.
death = death.states specifies that all observations of death.states are of type 3. death = TRUE specifies that all observations in the final absorbing state are of type 3.
covariates Formula representing the covariates on the transition intensities via a log-linear model. For example,
~ age + sex + treatment
covinits Initial values for log-linear effects of covariates on the transition intensities. This should be a named list with each element corresponding to a covariate. A single element contains the initial values for that covariate on each transition intensity, reading across the rows in order. For a pair of effects constrained to be equal, the initial value for the first of the two effects is used.
For example, for a model with the above qmatrix and age and sex covariates, the following initialises all covariate effects to zero apart from the age effect on the 2-1 transition, and the sex effect on the 1-3 transition. covinits = list(sex=c(0, 0, 0.1, 0), age=c(0, 0.1, 0, 0))
For factor covariates, name each level by concatenating the name of the covariate with the level name, quoting if necessary. For example, for a covariate agegroup with three levels 0-15, 15-60, 60-, use something like
covinits = list("agegroup15-60"=c(0, 0.1, 0, 0), "agegroup60-"=c(0.1, 0.1, 0, 0))
If not specified or wrongly specified, initial values are assumed to be zero.
constraint A list of one vector for each named covariate. The vector indicates which covariate effects on intensities are constrained to be equal. Take, for example, a model with five transition intensities and two covariates. Specifying

constraint = list (age = c(1,1,1,2,2), treatment = c(1,2,3,4,5))

constrains the effect of age to be equal for the first three intensities, and equal for the fourth and fifth. The effect of treatment is assumed to be different for each intensity. Any vector of increasing numbers can be used as indicators. The intensity parameters are assumed to be ordered by reading across the rows of the transition matrix, starting at the first row, ignoring the diagonals.
For categorical covariates, defined using factor(covname), specify constraints as follows:

list(..., covnameVALUE1 = c(...), covnameVALUE2 = c(...), ...)

where VALUE1, VALUE2, ... are the levels of the factor. Make sure the contrasts option is set appropriately, for example, the default
options(contrasts=c(contr.treatment, contr.poly))
sets the first (baseline) level of unordered factors to zero.
To assume no covariate effect on a certain transition, set its initial value to zero and use the fixedpars argument to fix it during the optimisation.
misccovariates A formula representing the covariates on the misclassification probabilities, analogously to covariates. Only used if the model is specified using ematrix, rather than hmodel.
misccovinits Initial values for the covariates on the misclassification probabilities, defined in the same way as covinits. Only used if the model is specified using ematrix.
miscconstraint A list of one vector for each named covariate on misclassification probabilities. The vector indicates which covariate effects on misclassification probabilities are constrained to be equal, analogously to constraint. Only used if the model is specified using ematrix.
hcovariates List of formulae the same length as hmodel, defining any covariates governing the hidden Markov outcome models. The covariates operate on a suitably link-transformed linear scale, for example, log scale for a Poisson outcome model. If there are no covariates for a certain hidden state, then insert a NULL in the corresponding place in the list. For example, hcovariates = list(~acute + age, ~acute, NULL).
hcovinits Initial values for the hidden Markov model covariates. A list of the same length as hcovariates. Each element is a vector with initial values for each covariate on that state. For example, the above hcovariates can be initialised with hcovariates = list(c(-8, 0), -8, NULL). Initial values must be given for all or no covariates, if none are given these are all set to zero. The initial value given in the hmodel constructor function for the corresponding baseline parameter is interpreted as the value of that parameter with any covariates fixed to their means in the data.
hconstraint A named list. Each element is a vector of constraints on the named hidden Markov model parameter. The vector has length equal to the number of times that class of parameter appears in the whole model.
For example consider the three-state hidden Markov model described above, with normally-distributed outcomes for states 1 and 2. To constrain the outcome variance to be equal for states 1 and 2, and to also constrain the effect of acute on the outcome mean to be equal for states 1 and 2, specify
hconstraint = list(sd = c(1,1), acute=c(1,1))
qconstraint A vector of indicators specifying which baseline transition intensities are equal. For example,
qconstraint = c(1,2,3,3)
constrains the third and fourth intensities to be equal, in a model with four allowed instantaneous transitions.
econstraint A similar vector of indicators specifying which baseline misclassification probabilities are constrained to be equal. Only used if the model is specified using ematrix, rather than hmodel.
initprobs Currently only used in hidden Markov models. Vector of assumed underlying state occupancy probabilities at each individual's first observation. If these are estimated (see est.initprobs), then this defaults to equal probability for each state. Otherwise this defaults to c(1, rep(0, nstates-1)), that is, in state 1 with a probability of 1. Scaled to sum to 1 if necessary.
est.initprobs If TRUE, then the underlying state occupancy probabilities at the first observation will be estimated, starting from initial values taken from the initprobs argument. Be warned that if any of these initial values are 0 or 1, then optim will give an "non-finite value" error. To fix any of these probabilities during the estimation, e.g. at values of 0 or 1, then use an appropriate fixedpars argument. Note that the free parameters during this estimation excludes the state 1 occupancy probability, which is fixed at 1 minus the sum of the other probabilities.
death Vector of indices of the death states. A death state is an absorbing state whose time of entry is known exactly, but the individual is assumed to be in an unknown transient state ("alive") at the previous instant. This is the usual situation for times of death in chronic disease monitoring data. For example, if you specify death = c(4, 5) then states 4 and 5 are assumed to be death states.
death = TRUE indicates that the final state is a death state, and death = FALSE (the default) indicates that there is no death state. See the obstype argument.
censor A state, or vector of states, which indicates censoring. Censoring means that the observed state is known only to be one of a particular set of states. For example, censor=999 indicates that all observations of 999 in the vector of observed states denote censoring times. By default, this means that the true state could have been anything other than an absorbing state. To specify corresponding true states explicitly, use a censor.states argument.
censor.states Specifies the underlying states which censored observations can represent. If censor is a single number (the default) this can be a vector, or a list with one element. If censor is a vector with more than one element, this should be a list, with each element a vector corresponding to the equivalent element of censor. For example
censor = c(99, 999), censor.states = list(c(2,3), c(3,4))
means that observations coded 99 represent either state 2 or state 3, while observations coded 999 are really either state 3 or state 4.
exacttimes By default, the transitions of the Markov process are assumed to take place at unknown occasions in between the observation times. If exacttimes is set to TRUE, then all observation times are assumed to represent the exact and complete times of transition of the process. This is equivalent to every row of the data having obstype = 2. See the obstype argument.
cl Width of symmetric confidence intervals for maximum likelihood estimates, by default 0.95.
fixedpars Vector of indices of parameters whose values will be fixed at their initial values during the optimisation. These are given in the order: transition intensities (reading across rows of the transition matrix), covariates on intensities (ordered by intensities within covariates), hidden Markov model parameters (ordered by parameters within states), hidden Markov model covariate parameters (ordered by covariates within parameters within states), initial state occupancy probabilities (excluding the first probability, which is fixed at one minus the sum of the others).
For covariates on misclassification probabilities, this is a change from version 0.4 in the parameter ordering. Previously these were ordered by misclassification probabilities within covariates.
This can be useful for profiling likelihoods, and building complex models stage by stage. To fix all parameters, specify fixedpars = TRUE.
center If TRUE (the default) then covariates are centered at their means during the maximum likelihood estimation. This usually improves convergence.
opt.method Quoted name of the R function to perform minimisation of the minus twice log likelihood. Either "optim" or "nlm". optim is the default.
hessian If TRUE (the default) then the Hessian matrix is computed at the maximum likelihood estimates, to obtain standard errors and confidence intervals.
use.deriv If TRUE then analytic first derivatives are used in the optimisation of the likelihood, when an appropriate quasi-Newton optimisation method, such as BFGS, is being used. Note that the default for optim is a Nelder-Mead method which cannot use derivatives. However, these derivatives, if supplied, are always used to calculate the Hessian.
deriv.test If TRUE, then analytic and numeric derivatives are computed and compared at the initial values, and no optimisation is performed.
analyticp By default, the likelihood for certain simpler 3, 4 and 5 state models is calculated using an analytic expression for the transition probability (P) matrix. To revert to the original method of using the matrix exponential, specify analyticp=FALSE. See the PDF manual for a list of the models for which analytic P matrices are implemented.
... Optional arguments to the general-purpose R optimisation routines, optim or nlm. Useful options for optim include method="BFGS" for using a quasi-Newton optimisation algorithm, which can often be faster than the default Nelder-Mead. If the optimisation fails to converge, consider normalising the problem using, for example, control=list(fnscale = 2500), for example, replacing 2500 by a number of the order of magnitude of the likelihood. If 'false' convergence is reported and the standard errors cannot be calculated due to a non-positive-definite Hessian, then consider tightening the tolerance criteria for convergence. If the optimisation takes a long time, intermediate steps can be printed using the trace argument of the control list. See optim for details.

Details

For full details about the methodology behind the msm package, refer to the PDF manual ‘msm-manual.pdf’ in the ‘doc’ subdirectory of the package. This includes a tutorial in the typical use of msm.

Users upgrading from versions of msm less than 0.5 will need to change some of their model fitting syntax. In particular, initial values are now specified in the qmatrix and covinits arguments instead of inits, and qmatrix is no longer a matrix of 0/1 indicators. See the appendix to the PDF manual or the NEWS file in the top-level installation directory for a full list of changes.

For simple multi-state Markov models, the likelihood is calculated in terms of the transition intensity matrix Q. When the data consist of observations of the Markov process at arbitrary times, the exact transition times are not known. Then the likelihood is calculated using the transition probability matrix P(t) = exp(tQ), where exp is the matrix exponential. If state i is observed at time t and state j is observed at time u, then the contribution to the likelihood from this pair of observations is the i,j element of P(u - t). See, for example, Kalbfleisch and Lawless (1985), Kay (1986), or Gentleman et al. (1994).

For hidden Markov models, the likelihood for an individual with k observations is calculated directly by summing over the unknown state at each time, producing a product of k matrices. The calculation is a generalisation of the method described by Satten and Longini (1996), and also by Jackson and Sharples (2002), and Jackson et al. (2003).

There must be enough information in the data on each state to estimate each transition rate, otherwise the likelihood will be flat and the maximum will not be found. It may be appropriate to reduce the number of states in the model, or reduce the number of covariate effects, to ensure convergence. Hidden Markov models are particularly susceptible to non-identifiability, especially when combined with a complex transition matrix.

Choosing an appropriate set of initial values for the optimisation can also be important. For flat likelihoods, 'informative' initial values will often be required.

Value

A list of class msm, with components:

call The original call to msm.
Qmatrices A list of matrices. The first component, labelled logbaseline, is a matrix containing the estimated transition intensities on the log scale with any covariates fixed at their means in the data. Each remaining component is a matrix giving the linear effects of the labelled covariate on the matrix of log intensities. To extract an estimated intensity matrix on the natural scale, at an arbitrary combination of covariate values, use the function qmatrix.msm.
QmatricesSE The standard error matrices corresponding to Qmatrices.
QmatricesL,QmatricesU Corresponding lower and upper symmetric confidence limits, of width 0.95 unless specified otherwise by the cl argument.
Ematrices A list of matrices. The first component, labelled logitbaseline, is the estimated misclassification probability matrix with any covariates fixed at their means in the data. Each remaining component is a matrix giving the linear effects of the labelled covariate on the matrix of logit misclassification probabilities. To extract an estimated misclassification probability matrix on the natural scale, at an arbitrary combination of covariate values, use the function ematrix.msm.
EmatricesSE The standard error matrices corresponding to Ematrices.
EmatricesL,EmatricesU Corresponding lower and upper symmetric confidence limits, of width 0.95 unless specified otherwise by the cl argument.
sojourn A list with components:
mean = estimated mean sojourn times in the transient states, with covariates fixed at their means.
se = corresponding standard errors.
minus2loglik Minus twice the maximised log-likelihood.
estimates Vector of untransformed maximum likelihood estimates returned from optim. Transition intensities are on the log scale and misclassification probabilities are the logit scale.
estimates.t Vector of transformed maximum likelihood estimates with intensities and probabilities on their natural scales.
fixedpars Indices of estimates which were fixed during the maximum likelihood estimation.
covmat Covariance matrix corresponding to estimates.
ci Matrix of confidence intervals corresponding to estimates
opt Return value from optim or nlm, giving information about the results of the optimisation.
foundse Logical value indicating whether the Hessian was positive-definite at the supposed maximum of the likelihood. If not, the covariance matrix of the parameters is unavailable. In these cases the optimisation has probably not converged to a maximum.
data A list of constants and vectors giving the data, for use in post-processing.
qmodel A list of objects specifying the model for transition intensities, for use in post-processing.
emodel A list of objects specifying the model for misclassification.
qcmodel A list of objects specifying the model for covariates on the transition intensities.
ecmodel A list of objects specifying the model for covariates on misclassification probabilities.
hmodel A list of class "hmodel", containing objects specifying the hidden Markov model. Estimates of "baseline" location parameters are presented with any covariates fixed to their means in the data.
cmodel A list of objects specifying any model for censoring.


Printing a msm object by typing the object's name at the command line implicitly invokes print.msm. This formats and prints the important information in the model fit. This includes the fitted transition intensity matrix, matrices containing covariate effects on intensities, and mean sojourn times from a fitted msm model. When there is a hidden Markov model, the chief information in the hmodel component is also formatted and printed. This includes estimates and confidence intervals for each parameter.
To extract summary information from the fitted model, it is recommended to use the more flexible extractor functions, such as qmatrix.msm, pmatrix.msm, sojourn.msm, instead of directly reading from list components of msm objects.

Author(s)

C. H. Jackson chris.jackson@imperial.ac.uk

References

Kalbfleisch, J., Lawless, J.F., The analysis of panel data under a Markov assumption Journal of the Americal Statistical Association (1985) 80(392): 863–871.

Kay, R. A Markov model for analysing cancer markers and disease states in survival studies. Biometrics (1986) 42: 855–865.

Gentleman, R.C., Lawless, J.F., Lindsey, J.C. and Yan, P. Multi-state Markov models for analysing incomplete disease history data with illustrations for HIV disease. Statistics in Medicine (1994) 13(3): 805–821.

Satten, G.A. and Longini, I.M. Markov chains with measurement error: estimating the 'true' course of a marker of the progression of human immunodeficiency virus disease (with discussion) Applied Statistics 45(3): 275-309 (1996)

Jackson, C.H. and Sharples, L.D. Hidden Markov models for the onset and progression of bronchiolitis obliterans syndrome in lung transplant recipients Statistics in Medicine, 21(1): 113–128 (2002).

Jackson, C.H., Sharples, L.D., Thompson, S.G. and Duffy, S.W. and Couto, E. Multi-state Markov models for disease progression with classification error. The Statistician, 52(2): 193–209 (2003)

See Also

simmulti.msm, plot.msm, summary.msm, qmatrix.msm, pmatrix.msm, sojourn.msm.

Examples

### Heart transplant data
### For further details and background to this example, see
### the PDF manual in the doc directory. 
data(heart)
print(heart[1:10,])
twoway4.q <- rbind(c(-0.5, 0.25, 0, 0.25), c(0.166, -0.498, 0.166, 0.166),
c(0, 0.25, -0.5, 0.25), c(0, 0, 0, 0))
statetable.msm(state, PTNUM, data=heart)
crudeinits.msm(state ~ years, PTNUM, data=heart, qmatrix=twoway4.q)
heart.msm <- msm( state ~ years, subject=PTNUM, data = heart, 
                 qmatrix = twoway4.q, death = 4,
                 control = list ( trace = 2, REPORT = 1 )  )
heart.msm
qmatrix.msm(heart.msm)
pmatrix.msm(heart.msm, t=10)
sojourn.msm(heart.msm)

[Package msm version 0.7.1 Index]