prelim.cat {cat} | R Documentation |
This function performs grouping and sorting operations on categorical datasets with missing values. It creates a list that is needed for input to em.cat, da.cat, imp.cat, etc.
prelim.cat(x, counts, levs)
x |
categorical data matrix containing missing values. The data may be
provided either in ungrouped or grouped format. In ungrouped format,
the rows of x correspond to individual observational units, so that
nrow(x) is the total sample size. In grouped format, the rows of x
correspond to distinct covariate patterns; the frequencies are
provided through the counts argument. In either format, the columns
correspond to variables. The categories must be coded as consecutive
positive integers beginning with 1 (1,2,...), and missing values are
denoted by NA .
|
counts |
optional vector of length nrow(x) giving the frequencies corresponding
to the covariate patterns in x. The total sample size is
sum(counts) . If counts is missing, the data are assumed to be
ungrouped; this is equivalent to taking counts equal to
rep(1,nrow(x)) .
|
levs |
optional vector of length ncol(x) indicating the number of levels
for each categorical variable. If missing, levs[j] is taken to be
max(x[,j],na.rm=T) .
|
a list of seventeen components that summarize various features of x after the data have been sorted by missingness patterns and grouped according to the observed values. Components that might be of interest to the user include:
nmis |
a vector of length ncol(x) containing the number of missing values
for each variable in x.
|
r |
matrix of response indicators showing the missing data patterns in x. Dimension is (m,p) where m is number of distinct missingness patterns in the rows of x, and p is the number of columns in x. Observed values are indicated by 1 and missing values by 0. The row names give the number of observations in each pattern, and the columns correspond to the columns of x. |
d |
vector of length ncol(x) indicating the number of levels for each
variable. The complete-data contingency table would be an array with
these dimensions. Identical to levs if levs was supplied.
|
ncells |
number of cells in the cross-classified contingency table, equal to
prod(d) .
|
Chapters 7–8 of Schafer (1996) Analysis of Incomplete Multivariate Data. Chapman & Hall.
em.cat
, ecm.cat
, da.cat
,mda.cat
, dabipf
, imp.cat
data(crimes) crimes s <- prelim.cat(crimes[,1:2],crimes[,3]) # preliminary manipulations s$nmis # see number of missing observations per variable s$r # look at missing data patterns