mi {mi}R Documentation

Multiple Iterative Regression Imputation

Description

Produce a multiple imputed matrix applying the elementary functions iteratively to the variables with missingness in the data randomly imputing each variable and looping through until approximate convergence.

Usage

## S4 method for signature 'data.frame':
mi( object, info,  n.imp = 3, n.iter = 30, 
    R.hat = 1.1, max.minutes = 20, rand.imp.method = "bootstrap", 
    preprocess = TRUE, continue.on.convergence = FALSE,
    seed = NA, check.coef.convergence = FALSE, 
    add.priors = prior.control(), post.run = TRUE)
    
## S4 method for signature 'mi':
mi( object, info, n.imp = 3, n.iter = 30, 
    R.hat = 1.1, max.minutes = 20, rand.imp.method = "bootstrap", 
    preprocess = TRUE, continue.on.convergence = FALSE,
    seed = NA, check.coef.convergence = FALSE)

Arguments

object A data frame containing the incomplete data. Missing data are coded as NA or mi object.
info mi.info object.
n.imp Number of multiple imputations. The default is m = 3.
n.iter Number of iterations to get convergence. The default is 30.
R.hat R.hat statistic for convergence check, default is 1.1.
max.minutes Maximum minutes to stop iterating. The default is 20.
seed Random seed
rand.imp.method Method for random imputation, see random.imp
preprocess Preprocess the data according to the variable types, see mi.preprocess
continue.on.convergence If set to TRUE the mi will run until maximum iteration is reached or maximum minutes pass.
check.coef.convergence default = FALSE
add.priors a list of parameters for controlling the process of adding priors for mi. See the documentation for prior.control for details.
post.run default is TRUE which will run 20 more iterations after the mi is finished if and only if some priors have been added into the mi process. This is to mitigate the influence of the priors to the whole procedure.

Details

Generate multiple imputations for incomplete data using iterative regression imputation. If the variables with missingness are a matrix Y with columns Y(1), . . . , Y(K) and the fully observed predictors are X, this entails first imputing all the missing Y values using some crude approach (for example, choosing imputed values for each variable by randomly selecting from the observed outcomes of that variable); and then imputing Y(1) given Y(2), . . . , Y(K) and X; imputing Y(2) given Y(1), Y(3), . . . , Y(K) and X (using the newly imputed values for Y(1)), and so forth, randomly imputing each variable and looping through until approximate convergence.

Value

A list of object of class mi, which stands for “multiple imputation”.
Each object is itself a list of 10 elements.

call Theimputation model
data The original data frame
m The number of imputations.
mi.info Information matrix of the mi
imp A list of length(m) of imputations.
converged Binary variable to indicate if the mi has converged.
coef.conv Binary variable to indicate if the coefs of mi model have converged, return NULL if check.coef.convergence = FALSE
bugs BUGS array of the mean and sd of each iteration.
preprocess Binary variable to indicate if preprocess=TRUE in the mi process
mi.info.preprocessed Information matrix that actually used in the mi if preprocess=TRUE
imp[[m]][[k]]@model the specified models used for imputing missing values
imp[[m]][[k]]@expected a list of vectors of length n-n.mis (number of complete observed data), specifying the estimated values of the models
imp[[m]][[k]]@random a list of vectors of length n.mis (number of NAs), specifying the random predicted values for imputing missing data

Author(s)

Masanao Yajima yajima@stat.columbia.edu, Yu-Sung Su yajima@stat.columbia.edu, M.Grazia Pittau grazia@stat.columbia.edu, Andrew Gelman gelman@stat.columbia.edu

References

Kobi Abayomi, Andrew Gelman and Marc Levy. (2008). “Diagnostics for multivariate imputations”. Applied Statistics 57, Part 3: 273–291.

Andrew Gelman and Maria Grazia Pittau. “A flexible program for missing-data imputation and model checking.” Technical report. Columbia University, New York.

Andrew Gelman and Jennifer Hill. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.

See Also

mi.completed, mi.data.frame, mi.continuous, mi.dichotomous, mi.count, mi.categorical, mi.polr, typecast, mi.info, mi.preprocess

Examples

# simulate fake data
set.seed(100)
n <- 100
u1 <- rbinom(n, 1, .5)
v1 <- log(rnorm(n, 5, 1))
x1 <- u1*exp(v1)
u2 <- rbinom(n, 1, .5)
v2 <- log(rnorm(n, 5, 1))
x2 <- u2*exp(v2)
x3 <- rbinom(n, 1, prob=0.45)
x4 <- ordered(rep(seq(1, 5),100)[sample(1:n, n)])
x5 <- rep(letters[1:10],10)[sample(1:n, n)]
x6 <- trunc(runif(n, 1, 10))
x7 <- rnorm(n)
x8 <- factor(rep(seq(1,6),10)[sample(1:n, n)])
x9 <- rpois(n, 50)
x10 <- runif(n, 0.1, .99)
y <- x1 + x2 + x7 + x9 + rnorm(n)
fakedata <- cbind.data.frame(y, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10)

# randomly create missing values
dat <- mi:::.create.missing(fakedata, pct.mis=30)

# get information matrix of the data
inf <- mi.info(dat)

# update the variable type of a specific variable to mi.info
inf <- update(inf, "type", list(x9="count"))

# run the imputation
## this is for test only
IMP <- mi(dat, info=inf, n.iter=6, add.priors=prior.control(K=1))

# pick up where you left off
# IMP <- mi(IMP)       ## NOT RUN

## this is the suggested (defautl) way of running mi, NOT RUN
# IMP <- mi(dat, info=inf, add.priors=prior.control(K=1))

# convergence checking
converged(IMP)  ## You should get FALSE here because only n.iter is small 
bugs.mi(IMP)    ## BUGS object to look at the R hat statistics

# visually check the imputation
plot(IMP)

[Package mi version 0.04-6 Index]