mi {mi} | R Documentation |
Produce a multiply imputed matrix applying the elementary functions iteratively to the variables with missingness in the data randomly imputing each variable and looping through until approximate convergence.
## S4 method for signature 'data.frame': mi( object, info, n.imp = 3, n.iter = 30, R.hat = 1.1, max.minutes = 20, rand.imp.method = "bootstrap", preprocess = TRUE, run.past.convergence = FALSE, seed = NA, check.coef.convergence = FALSE, add.noise = noise.control(), post.run = TRUE) ## S4 method for signature 'mi': mi(object, n.iter = 30, R.hat = 1.1, max.minutes = 20, rand.imp.method = "bootstrap", run.past.convergence = FALSE, seed = NA)
object |
A data frame or an mi object that contains an incomplete data. mi identifies NA s as the missing data. |
info |
The mi.info object. |
n.imp |
The number of multiple imputations. Default is 3 chains. |
n.iter |
The maximum number of imputation iterations. Default is 30 iterations. |
R.hat |
The value of the R.hat statistic used as a convergence criterion. Default is 1.1. |
max.minutes |
The maximum minutes to operate the whole imputation process. Default is 20 minutes. |
rand.imp.method |
The methods for random imputation. Currently, mi implements only the boostrap method. |
preprocess |
Default is TRUE . mi will transform the variables that are of nonnegative , positive-continuous , and proportion types. |
run.past.convergence |
Default is FALSE . If the value is set to be TRUE , mi will run until the values of either n.iter or max.minutes are reached even if the imputation is converged. |
seed |
The random number seed. |
check.coef.convergence |
Default is FALSE . If the value is set to be TRUE , mi will check the convergence of the coefficients of imputation models. |
add.noise |
A list of parameters for controlling the process of adding noise to mi via noise.control . |
post.run |
Default is TRUE . mi will run 20 more iterations after an imputation process is finished if and only if add.noise is not FALSE . This is to mitigate the influence of the noise to the whole imputation process. |
Generate multiple imputations for incomplete data using iterative regression imputation. If the variables with missingness are a matrix Y with columns Y(1), . . . , Y(K) and the fully observed predictors are X, this entails first imputing all the missing Y values using some crude approach (for example, choosing imputed values for each variable by randomly selecting from the observed outcomes of that variable); and then imputing Y(1) given Y(2), . . . , Y(K) and X; imputing Y(2) given Y(1), Y(3), . . . , Y(K) and X (using the newly imputed values for Y(1)), and so forth, randomly imputing each variable and looping through until approximate convergence.
A list of object of class mi
, which stands for “multiple imputation”.
Each object is itself a list of 10 elements.
call |
Theimputation model. |
data |
The original data frame. |
m |
The number of imputations. |
mi.info |
Information matrix of the mi . |
imp |
A list of length(m) of imputations. |
converged |
Binary variable to indicate if the mi has converged. |
coef.conv |
Binary variable to indicate if the coefs of mi model have converged, return
NULL if check.coef.convergence = FALSE |
bugs |
BUGS array of the mean and sd of each iteration. |
preprocess |
Binary variable to indicate if preprocess=TRUE in the mi process |
mi.info.preprocessed |
Information matrix that actually used in the mi if preprocess=TRUE . |
|
the specified models used for imputing missing values |
|
a list of vectors of length n-n.mis (number of complete observed data), specifying the estimated values of the models |
|
a list of vectors of length n.mis (number of NAs), specifying the random predicted values for imputing missing data |
Masanao Yajima yajima@stat.columbia.edu, Yu-Sung Su ys463@columbia.edu, M. Grazia Pittau grazia@stat.columbia.edu, Andrew Gelman gelman@stat.columbia.edu
Yu-Sung Su, Andrew Gelman, Jennifer Hill, Masanao Yajima. Forthcoming. “Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box”. Journal of Statistical Software.
Kobi Abayomi, Andrew Gelman and Marc Levy. (2008). “Diagnostics for multivariate imputations”. Applied Statistics 57, Part 3: 273–291.
Andrew Gelman and Jennifer Hill. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
mi.completed
, mi.data.frame
,
mi.continuous
, mi.binary
,
mi.count
, mi.categorical
,
mi.polr
, typecast
,
mi.info
, mi.preprocess
# simulate fake data set.seed(100) n <- 100 u1 <- rbinom(n, 1, .5) v1 <- log(rnorm(n, 5, 1)) x1 <- u1*exp(v1) u2 <- rbinom(n, 1, .5) v2 <- log(rnorm(n, 5, 1)) x2 <- u2*exp(v2) x3 <- rbinom(n, 1, prob=0.45) x4 <- ordered(rep(seq(1, 5),100)[sample(1:n, n)]) x5 <- rep(letters[1:10],10)[sample(1:n, n)] x6 <- trunc(runif(n, 1, 10)) x7 <- rnorm(n) x8 <- factor(rep(seq(1,10),10)[sample(1:n, n)]) x9 <- runif(n, 0.1, .99) x10 <- rpois(n, 10) y <- x1 + x2 + x7 + x9 + rnorm(n) fakedata <- cbind.data.frame(y, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10) # randomly create missing values dat <- mi:::.create.missing(fakedata, pct.mis=30) # get information matrix of the data inf <- mi.info(dat) # update the variable type of a specific variable to mi.info inf <- update(inf, "type", list(x10="count")) # run the imputation ## this is for test only IMP <- mi(dat, info=inf, n.iter=6, post.run=FALSE) # no noise # IMP <- mi(dat, info=inf, n.iter=6, add.noise=FALSE) ## NOT RUN # pick up where you left off # IMP <- mi(IMP) ## NOT RUN ## this is the suggested (defautl) way of running mi # IMP <- mi(dat, info=inf) ## NOT RUN # convergence checking converged(IMP) ## You should get FALSE here because only n.iter is small bugs.mi(IMP) ## BUGS object to look at the R hat statistics plot(IMP@bugs) ## visually check R.hat # visually check the imputation plot(IMP)