mi {mi} | R Documentation |
Produce a multiple imputed matrix applying the elementary functions iteratively to the variables with missingness in the data randomly imputing each variable and looping through until approximate convergence.
mi( object, info, type = NULL, n.imp = 3, n.iter = 30, max.minutes = 20, rand.imp.method = "bootstrap", preprocess = FALSE, continue.on.convergence = FALSE, seed = NA, check.coef.convergence = FALSE)
object |
A data frame containing the incomplete data. Missing data are coded as NA's or mi object. |
info |
mi.info object. |
type |
Vector of types. When you specify a type, types for all the columns must be specified. |
n.imp |
Number of multiple imputations. The default is m = 3. |
n.iter |
Number of iterations to get convergence. The default is 5. |
max.minutes |
Maximum minutes to stop iterating. The default is 20. |
seed |
Random seed. |
rand.imp.method |
Method for random imputation |
preprocess |
Preprocess the data according to the info matrix. |
continue.on.convergence |
If set to TRUE the mi will run until maximum iteration is reached or maximum minutes pass. |
check.coef.convergence |
default = FALSE |
... |
options for plot.mi. |
Generate multiple imputations for incomplete data using iterative regression imputation. If the variables with missingness are a matrix Y with columns Y(1), . . . , Y(K) and the fully observed predictors are X, this entails first imputing all the missing Y values using some crude approach (for example, choosing imputed values for each variable by randomly selecting from the observed outcomes of that variable); and then imputing Y(1) given Y(2), . . . , Y(K) and X; imputing Y(2) given Y(1), Y(3), . . . , Y(K) and X (using the newly imputed values for Y(1)), and so forth, randomly imputing each variable and looping through until approximate convergence.
A list of object of class mi, which stands for multiple imputation. Each object is itself a list of 8 elements.
data |
The original data frame. |
imp.dat |
A data frame with the columns to be imputed. |
obs.dat |
A data frame with the completed columns. |
m |
The number of imputations. |
nmis |
An array containing the number of missing observations per columns. |
imp |
A list of length(m) of imputations. |
converged |
Binary variable to indicate if mi has converged. |
bugs |
BUGS array of the mean and sd of each iteration. |
The imp
method creates a list of length(m) of imputations, whose names are:
Imputation1
, Imputation2
, Imputation3
.
Each imp[[m]]
is itself a list containg:
- imp[[m]]$Imp.Models
: the specified models used for imputing NA's in each columns of dat;
- imp[[m]]$Random.predicted
: a list of vectors of length n.mis (number of NA's), specifying the random predicted values for imputing
missing data. For the "mixed" variables the vectors of random values are three: the random values
predicted by using the binomial distribution (corresponding to the the first step of the imputation procedure);
the random values predicted by using the normal distribution (corresponding to the second step of the imputation procedure)
and finally the vector of random values (obtained multiplying the previous two vectors) whose values are positive
whether missing values are positive, otherwise are equal to zero. For the categorical variables
the random values are predicted by using the Multinomial ditribution;
- imp[[m]]$Random.predicted
: a list of vectors of length n-n.mis (number of complete observed data), specifying the
estimated values of the models. For the "mixed" variables the vectors of estimated values are two,
according to the two steps imputation procedure;
- imp[[m]]$Residual.values
: a list of vectors of residuals will be used for checking the models.
For the "mixed" variables the vectors of residuals are two,
according to the two steps imputation procedure;
- imp[[m]]$Imputed.matrix
: a data frame with the missing data imputed.
Masanao Yajima yajima@stat.columbia.edu, M.Grazia Pittau grazia@stat.columbia.edu, Andrew Gelman gelman@stat.columbia.edu
Andrew Gelman and M. Grazia Pittau, A flexible program for missing-data imputation and model checking, Technical report, Columbia University, New York; Andrew Gelman and Jennifer Hill, Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press, 2007.
data(CHAIN) imp.CHAIN <- mi( CHAIN, n.imp = 3, n.iter = 6 ) is.mi( imp.CHAIN ) ## Is this a mi object? data.mi ( imp.CHAIN ) ## You can get the original data mi.mt <- mi.matrix( imp.CHAIN, m = 1 ) ## The imputed matrix for the first imputation mi.df <- mi.data.frame( imp.CHAIN, m = 1 ) ## The imputed data frame for the second imputation ############################## # Convergence checking ############################## converged( imp.CHAIN ) ## You should get FALSE because its only 5 iterations bugs.mi( imp.CHAIN ) ## BUGS object to look at the R hat statistics # NOT RUN #imp.CHAIN <- mi( imp.CHAIN, n.iter=5 ) ## You can pick up from where you left off