mi {mi} | R Documentation |
Produce a multiple imputed matrix applying the elementary functions iteratively to the variables with missingness in the data randomly imputing each variable and looping through until approximate convergence.
## S4 method for signature 'data.frame': mi( object, info, n.imp = 3, n.iter = 30, R.hat = 1.1, max.minutes = 20, rand.imp.method = "bootstrap", preprocess = TRUE, continue.on.convergence = FALSE, seed = NA, check.coef.convergence = FALSE, add.priors = prior.control(), post.run = TRUE) ## S4 method for signature 'mi': mi( object, info, n.imp = 3, n.iter = 30, R.hat = 1.1, max.minutes = 20, rand.imp.method = "bootstrap", preprocess = TRUE, continue.on.convergence = FALSE, seed = NA, check.coef.convergence = FALSE)
object |
A data frame containing the incomplete data. Missing data are coded as NA or mi object. |
info |
mi.info object. |
n.imp |
Number of multiple imputations. The default is m = 3 . |
n.iter |
Number of iterations to get convergence. The default is 30. |
R.hat |
R.hat statistic for convergence check, default is 1.1. |
max.minutes |
Maximum minutes to stop iterating. The default is 20. |
seed |
Random seed |
rand.imp.method |
Method for random imputation, see random.imp |
preprocess |
Preprocess the data according to the variable types,
see mi.preprocess |
continue.on.convergence |
If set to TRUE the mi will run
until maximum iteration is reached or maximum minutes pass. |
check.coef.convergence |
default = FALSE |
add.priors |
a list of parameters for controlling the process of adding priors for
mi . See the documentation for prior.control for details. |
post.run |
default is TRUE which will run 20 more iterations after the mi is finished
if and only if some priors have been added into the mi process. This is to mitigate the
influence of the priors to the whole procedure. |
Generate multiple imputations for incomplete data using iterative regression imputation. If the variables with missingness are a matrix Y with columns Y(1), . . . , Y(K) and the fully observed predictors are X, this entails first imputing all the missing Y values using some crude approach (for example, choosing imputed values for each variable by randomly selecting from the observed outcomes of that variable); and then imputing Y(1) given Y(2), . . . , Y(K) and X; imputing Y(2) given Y(1), Y(3), . . . , Y(K) and X (using the newly imputed values for Y(1)), and so forth, randomly imputing each variable and looping through until approximate convergence.
A list of object of class mi
, which stands for “multiple imputation”.
Each object is itself a list of 10 elements.
call |
Theimputation model |
data |
The original data frame |
m |
The number of imputations. |
mi.info |
Information matrix of the mi |
imp |
A list of length(m) of imputations. |
converged |
Binary variable to indicate if the mi has converged. |
coef.conv |
Binary variable to indicate if the coefs of mi model have converged, return
NULL if check.coef.convergence = FALSE |
bugs |
BUGS array of the mean and sd of each iteration. |
preprocess |
Binary variable to indicate if preprocess=TRUE in the mi process |
mi.info.preprocessed |
Information matrix that actually used in the mi if preprocess=TRUE |
|
the specified models used for imputing missing values |
|
a list of vectors of length n-n.mis (number of complete observed data), specifying the estimated values of the models |
|
a list of vectors of length n.mis (number of NAs), specifying the random predicted values for imputing missing data |
Masanao Yajima yajima@stat.columbia.edu, Yu-Sung Su yajima@stat.columbia.edu, M.Grazia Pittau grazia@stat.columbia.edu, Andrew Gelman gelman@stat.columbia.edu
Kobi Abayomi, Andrew Gelman and Marc Levy. (2008). “Diagnostics for multivariate imputations”. Applied Statistics 57, Part 3: 273–291.
Andrew Gelman and Maria Grazia Pittau. “A flexible program for missing-data imputation and model checking.” Technical report. Columbia University, New York.
Andrew Gelman and Jennifer Hill. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
mi.completed
, mi.data.frame
,
mi.continuous
, mi.dichotomous
,
mi.count
, mi.categorical
,
mi.polr
, typecast
,
mi.info
, mi.preprocess
# simulate fake data set.seed(100) n <- 100 u1 <- rbinom(n, 1, .5) v1 <- log(rnorm(n, 5, 1)) x1 <- u1*exp(v1) u2 <- rbinom(n, 1, .5) v2 <- log(rnorm(n, 5, 1)) x2 <- u2*exp(v2) x3 <- rbinom(n, 1, prob=0.45) x4 <- ordered(rep(seq(1, 5),100)[sample(1:n, n)]) x5 <- rep(letters[1:10],10)[sample(1:n, n)] x6 <- trunc(runif(n, 1, 10)) x7 <- rnorm(n) x8 <- factor(rep(seq(1,6),10)[sample(1:n, n)]) x9 <- rpois(n, 50) x10 <- runif(n, 0.1, .99) y <- x1 + x2 + x7 + x9 + rnorm(n) fakedata <- cbind.data.frame(y, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10) # randomly create missing values dat <- mi:::.create.missing(fakedata, pct.mis=30) # get information matrix of the data inf <- mi.info(dat) # update the variable type of a specific variable to mi.info inf <- update(inf, "type", list(x9="count")) # run the imputation ## this is for test only IMP <- mi(dat, info=inf, n.iter=6, add.priors=prior.control(K=1)) # pick up where you left off # IMP <- mi(IMP) ## NOT RUN ## this is the suggested (defautl) way of running mi, NOT RUN # IMP <- mi(dat, info=inf, add.priors=prior.control(K=1)) # convergence checking converged(IMP) ## You should get FALSE here because only n.iter is small bugs.mi(IMP) ## BUGS object to look at the R hat statistics # visually check the imputation plot(IMP)