mcsimex {simex}R Documentation

The Misclassification SIMEX

Description

Implementation of the Misclassification SIMEX Algorithm as described by Küchenhoff, Mwalili and Lesaffre.

Usage

mcsimex(model
        , SIMEXvariable
        , mc.matrix
        , lambda = c(0.5,1,1.5,2)
        , B = 100
        , jackknife.estimation = "quad"
        , asymptotic = TRUE
        , fitting.method = "quad")

Arguments

model The naive model, the misclassified variable must be a factor
mc.matrix If one variable is misclassified it can be a matrix. If more than one variable is misclssified it must be a list of the misclassification matrices, names must match with the SIMEXvariabel names, column- and row-names must match with the factor levels. If a special misclssification is desired, the name of a function can be specified (see details)
lambda vector of exponents for the misclassification matrix (without 0)
SIMEXvariable vector of names of the variables for which the MCSIMEX-method should be applied
B number of iterations for each lambda
fitting.method linear, quadratic and loglinear are implemented (first 4 letters are enough)
jackknife.estimation specifying the extrapolation method for jackknife variance estimation. Can be set to FALSE if it should not be performed
asymptotic logical, indicating if asymptotic variance estimation should be done, the option x =TRUE must be enabled in the naive model.

Details

if mc.matrix is a function the first argument of that function must be the whole dataset used in the naive model, the second argument must be the exponent (lambda) for the misclassification. The function must return a data.frame containing the misclassified SIMEXvariable. An example can be found below.

Asymptotic variance estimation is only implemented for lm and glm

The loglinear fit has the form g(lambda,GAMMA) = exp(gamma0+gamma1*lambda). It is realized via the log() function. To avoid negaitve values the minimum +1 of the dataset is added and after the prediction later subtracted. exp(predict(...)) - min(data)-1

the 'log2' fit is fitted via the nls() function. As starting values the fit of the loglinear extrapolant is used. For details see fit.logl

Value

object of class MCSIMEX

coefficients corrected coefficients of the MCSIMEX-model
SIMEX.estimates the MCSIMEX-estimates of the coefficients for each lambda
lambda the values of lambda
model naive model
mc.matrix the misclassification matrix
B the number of iterations
extrapolation model-object of the extrapolation step
fitting.method the fitting method used in the exrapolation step
SIMEXvariable name of the SIMEXvariables
call the function call,
variance.asymptotic the asymptotic variance estimates
variance.jackknife the jackknife variance estimates
extrapolation.variance the model-object of the variance extrapolation
variance.jackknife.lambda data set for the extrapolation
theta all estimated coefficients for each lambda and B

...

Author(s)

Wolfgang Lederer, wolfgang.lederer@googlemail.com

References

Küchenhoff, H., Mwalili, S. M. and Lesaffre (2005) E. A general method for dealing with misclassification in regression: the Misclassification SIMEX. Biometrics,in press

See Also

misclass, simex,refit

Examples

x <- rnorm(200,0,1.142)
z <- rnorm(200,0,2)
y <- factor(rbinom(200,1,(1/(1+exp(-1*(-2 + 1.5*x -0.5*z))))))
Pi <- matrix(data = c(0.9,0.1,0.3,0.7), nrow =2, byrow =FALSE)
dimnames(Pi) <- list(levels(y),levels(y))
ystar <- misclass(data.frame(y), list(y = Pi), k=1)[,1]
naive.model <- glm(ystar ~ x + z, family = binomial, x=TRUE, y =TRUE)
true.model  <- glm(y ~ x + z, family = binomial)
simex.model <- mcsimex(naive.model, mc.matrix = Pi, SIMEXvariable = "ystar")

op <-par(mfrow = c(2,3))
invisible(lapply(simex.model$theta, boxplot, notch=TRUE, outline =FALSE, names=c(0.5,1,1.5,2)))
plot(simex.model)
par(op)

## example for a function which can be supplied to the function mcsimex()
## "xm" is the variable which is to be misclassified

my.mc <- function(datas,k){
        xm <- datas$"xm"
        p1 <- matrix(data = c(0.75,0.25,0.25,0.75), nrow =2, byrow = FALSE)
        colnames(p1) <- levels(xm)
        rownames(p1) <- levels(xm)
        p0 <- matrix(data = c(0.8,0.2,0.2,0.8), nrow =2, byrow =FALSE)
        colnames(p0) <- levels(xm)
        rownames(p0) <- levels(xm)
        xm[datas$y=="1"] <- misclass(data.frame(xm=xm[datas$y=="1"]),list(xm=p1), k=k)[,1]
        xm[datas$y=="0"] <- misclass(data.frame(xm=xm[datas$y=="0"]),list(xm=p0), k=k)[,1]
        xm <- factor(xm)
        return(data.frame(xm))
        }


[Package simex version 1.2 Index]