mi {mi}R Documentation

Multiple Iterative Regression Imputation

Description

Produce a multiply imputed matrix applying the elementary functions iteratively to the variables with missingness in the data randomly imputing each variable and looping through until approximate convergence.

Usage

## S4 method for signature 'data.frame':
mi( object, info,  n.imp = 3, n.iter = 30, 
    R.hat = 1.1, max.minutes = 20, rand.imp.method = "bootstrap", 
    preprocess = TRUE, run.past.convergence = FALSE,
    seed = NA, check.coef.convergence = FALSE, 
    add.noise = noise.control(), post.run = TRUE)
    
## S4 method for signature 'mi':
mi(object, n.iter = 30, 
    R.hat = 1.1, max.minutes = 20, rand.imp.method = "bootstrap", 
    run.past.convergence = FALSE,  seed = NA)

Arguments

object A data frame or an mi object that contains an incomplete data. mi identifies NAs as the missing data.
info The mi.info object.
n.imp The number of multiple imputations. Default is 3 chains.
n.iter The maximum number of imputation iterations. Default is 30 iterations.
R.hat The value of the R.hat statistic used as a convergence criterion. Default is 1.1.
max.minutes The maximum minutes to operate the whole imputation process. Default is 20 minutes.
rand.imp.method The methods for random imputation. Currently, mi implements only the boostrap method.
preprocess Default is TRUE. mi will transform the variables that are of nonnegative, positive-continuous, and proportion types.
run.past.convergence Default is FALSE. If the value is set to be TRUE, mi will run until the values of either n.iter or max.minutes are reached even if the imputation is converged.
seed The random number seed.
check.coef.convergence Default is FALSE. If the value is set to be TRUE, mi will check the convergence of the coefficients of imputation models.
add.noise A list of parameters for controlling the process of adding noise to mi via noise.control.
post.run Default is TRUE. mi will run 20 more iterations after an imputation process is finished if and only if add.noise is not FALSE. This is to mitigate the influence of the noise to the whole imputation process.

Details

Generate multiple imputations for incomplete data using iterative regression imputation. If the variables with missingness are a matrix Y with columns Y(1), . . . , Y(K) and the fully observed predictors are X, this entails first imputing all the missing Y values using some crude approach (for example, choosing imputed values for each variable by randomly selecting from the observed outcomes of that variable); and then imputing Y(1) given Y(2), . . . , Y(K) and X; imputing Y(2) given Y(1), Y(3), . . . , Y(K) and X (using the newly imputed values for Y(1)), and so forth, randomly imputing each variable and looping through until approximate convergence.

Value

A list of object of class mi, which stands for “multiple imputation”.
Each object is itself a list of 10 elements.

call Theimputation model.
data The original data frame.
m The number of imputations.
mi.info Information matrix of the mi.
imp A list of length(m) of imputations.
converged Binary variable to indicate if the mi has converged.
coef.conv Binary variable to indicate if the coefs of mi model have converged, return NULL if check.coef.convergence = FALSE
bugs BUGS array of the mean and sd of each iteration.
preprocess Binary variable to indicate if preprocess=TRUE in the mi process
mi.info.preprocessed Information matrix that actually used in the mi if preprocess=TRUE.
imp[[m]][[k]]@model the specified models used for imputing missing values
imp[[m]][[k]]@expected a list of vectors of length n-n.mis (number of complete observed data), specifying the estimated values of the models
imp[[m]][[k]]@random a list of vectors of length n.mis (number of NAs), specifying the random predicted values for imputing missing data

Author(s)

Masanao Yajima yajima@stat.columbia.edu, Yu-Sung Su ys463@columbia.edu, M. Grazia Pittau grazia@stat.columbia.edu, Andrew Gelman gelman@stat.columbia.edu

References

Yu-Sung Su, Andrew Gelman, Jennifer Hill, Masanao Yajima. Forthcoming. “Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box”. Journal of Statistical Software.

Kobi Abayomi, Andrew Gelman and Marc Levy. (2008). “Diagnostics for multivariate imputations”. Applied Statistics 57, Part 3: 273–291.

Andrew Gelman and Jennifer Hill. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.

See Also

mi.completed, mi.data.frame, mi.continuous, mi.binary, mi.count, mi.categorical, mi.polr, typecast, mi.info, mi.preprocess

Examples

# simulate fake data
set.seed(100)
n <- 100
u1 <- rbinom(n, 1, .5)
v1 <- log(rnorm(n, 5, 1))
x1 <- u1*exp(v1)
u2 <- rbinom(n, 1, .5)
v2 <- log(rnorm(n, 5, 1))
x2 <- u2*exp(v2)
x3 <- rbinom(n, 1, prob=0.45)
x4 <- ordered(rep(seq(1, 5),100)[sample(1:n, n)])
x5 <- rep(letters[1:10],10)[sample(1:n, n)]
x6 <- trunc(runif(n, 1, 10))
x7 <- rnorm(n)
x8 <- factor(rep(seq(1,10),10)[sample(1:n, n)])
x9 <- runif(n, 0.1, .99)
x10 <- rpois(n, 10)
y <- x1 + x2 + x7 + x9 + rnorm(n)
fakedata <- cbind.data.frame(y, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10)

# randomly create missing values
dat <- mi:::.create.missing(fakedata, pct.mis=30)

# get information matrix of the data
inf <- mi.info(dat)

# update the variable type of a specific variable to mi.info
inf <- update(inf, "type", list(x10="count"))

# run the imputation
## this is for test only
IMP <- mi(dat, info=inf, n.iter=6, post.run=FALSE)

# no noise
# IMP <- mi(dat, info=inf, n.iter=6, add.noise=FALSE) ## NOT RUN

# pick up where you left off
# IMP <- mi(IMP)       ## NOT RUN

## this is the suggested (defautl) way of running mi
# IMP <- mi(dat, info=inf) ## NOT RUN

# convergence checking
converged(IMP)  ## You should get FALSE here because only n.iter is small 
bugs.mi(IMP)    ## BUGS object to look at the R hat statistics
plot(IMP@bugs)  ## visually check R.hat

# visually check the imputation
plot(IMP)

[Package mi version 0.08-06 Index]