mmglm {HiddenMarkov}R Documentation

Markov Modulated GLM Object

Description

Creates a Markov modulated generalised linear model object with class "mmglm".

Usage

mmglm(x, Pi, delta, family, link, beta, glmformula = formula(y~x1),
      sigma = NA, nonstat = TRUE)

Arguments

x a dataframe containing the observed variable (i.e. the response variable in the generalised linear model) and the covariate. Currently, the response variable must be named y and the covariate x1. Alternatively, x could be specified as NULL, meaning that the data will be added later (e.g. simulated). See Details below for the binomial case.
Pi is the m times m transition probability matrix of the hidden Markov chain.
delta is the marginal probability distribution of the m hidden states at the first time point.
family character string, the GLM family, one of "gaussian", "poisson", "Gamma" or "binomial".
link character string, the link function. If family == "binomial", then one of "logit", "probit" or "cloglog"; else one of "identity", "inverse" or "log".
beta a 2 times m matrix containing parameter estimates. The first row contains the m constants in the linear predictor for each Markov state, and the second row contains the linear regression coefficient in the linear predictor for each Markov state.
glmformula currently the only model formula is y~x1.
sigma if family == "gaussian", then it is the variance; if family == "Gamma", then it is 1/sqrt(shape); for each Markov state.
nonstat is logical, TRUE if the homogeneous Markov chain is assumed to be non-stationary, default.

Details

This model assumes that the observed responses are ordered in time, together with a covariate at each point. The model is based on a simple regression model within the glm framework (see McCullagh & Nelder, 1989), but where the coefficients β_0 and β_1 in the linear predictor vary according to a hidden Markov state. The responses are assumed to be conditionally independent given the value of the Markov chain.

If family == "binomial" then the response variable y is interpreted as the number of successes. The dataframe x must also contain a variable called size being the number of Bernoulli trials. This is different to the format used by the function glm where y would be a matrix with two columns containing the number of successes and failures, respectively. The different format here allows one to specify the number of Bernoulli trials only so that the number of successes or failures can be simulated later.

When the density function of the response variable is from the exponential family (Charnes et al, 1976, Eq. 2.1), the likelihood function (Charnes et al, 1976, Eq. 2.4) can be maximised by using iterative weighted least squares (Charnes et al, 1976, Eq. 1.1 and 1.2). This is the method used by the R function glm. In this Markov modulated version of the model, the third term of the complete data log-likelihood, as given in Harte (2006, Sec. 2.3), needs to be maximised. This is simply the sum of the individual log-likelihood contributions of the response variable weighted by the Markov state probabilities calculated in the E-step. This can also be maximised using iterative least squares by passing these additional weights (Markov state probabilities) into the glm function.

Value

A list object with class "mmglm", containing the above arguments as named components.

Examples

delta <- c(0,1)

Pi <- matrix(c(0.8, 0.2,
               0.3, 0.7),
             byrow=TRUE, nrow=2)

#--------------------------------------------------------
#     Poisson with log link function

x <- mmglm(NULL, Pi, delta, family="poisson", link="log",
           beta=rbind(c(0.1, -0.1), c(1, 5)))

x <- simulate(x, nsim=5000, seed=10)

y <- BaumWelch(x)

hist(residuals(y))
print(summary(y))
print(logLik(y))

#--------------------------------------------------------
#     Binomial with logit link function

x <- mmglm(NULL, Pi, delta, family="binomial", link="logit",
           beta=rbind(c(0.1, -0.1), c(1, 5)))

x <- simulate(x, nsim=5000, seed=10)

y <- BaumWelch(x)

hist(residuals(y))
print(summary(y))
print(logLik(y))

#--------------------------------------------------------
#     Gaussian with identity link function

x <- mmglm(NULL, Pi, delta, family="gaussian", link="identity",
           beta=rbind(c(0.1, -0.1), c(1, 5)), sigma=c(1, 2))

x <- simulate(x, nsim=5000, seed=10)

y <- BaumWelch(x)

hist(residuals(y))
print(summary(y))
print(logLik(y))

#--------------------------------------------------------
#     Gamma with log link function

x <- mmglm(NULL, Pi, delta, family="Gamma", link="log",
           beta=rbind(c(2, 1), c(-2, 1.5)), sigma=c(0.2, 0.1))

x1 <- seq(0.01, 0.99, 0.01)
plot(x1, exp(x$beta[1,2] + x$beta[2,2]*x1), type="l",
     xlim=c(0,1),ylim=c(0, 10), col="red", lwd=3)
points(x1, exp(x$beta[1,1] + x$beta[2,1]*x1), type="l",
       col="blue", lwd=3)

x <- simulate(x, nsim=1000, seed=10)

points(x$x$x1, x$x$y)

x$beta[2,] <- c(-3, 4)
y <- BaumWelch(x, bwcontrol(posdiff=FALSE))

hist(residuals(y))
print(summary(y))
print(logLik(y))

[Package HiddenMarkov version 1.2-7 Index]