poLCA.simdata {poLCA} | R Documentation |
Create simulated cross-classification data
Description
Uses the latent class model's assumed data-generating process to create a simulated dataset that can be used to test the properties of the poLCA latent class and latent class regression estimator.
Usage
poLCA.simdata(N = 5000, probs = NULL, nclass = 2, ndv = 4,
nresp = NULL, x = NULL, niv = 0, b = NULL,
classdist = NULL, missval = FALSE, pctmiss = NULL)
Arguments
N |
number of observations. |
probs |
a list of matrices of dimension nclass by nresp with each matrix corresponding to one manifest variable, and each row containing the class-conditional outcome probabilities (which must sum to 1) If probs is NULL (default) then the outcome probabilities are generated randomly. |
nclass |
number of latent classes. Ifprobs is specified, then nclass is set equal to the number of rows in each matrix in that list. If classdist is specified, then nclass is set equal to the length of that vector. If b is specified, then nclass is set equal to one greater than the number of columns in b . Otherwise, the default is two. |
ndv |
number of manifest variables. If probs is specified, then ndv is set equal to the number of matrices in that list. If nresp is specified, then ndv is set equal to the length of that vector. Otherwise, the default is four. |
nresp |
number of possible outcomes for each manifest variable. If probs is specified, then ndv is set equal to the number of columns in each matrix in that list. If both probs and nresp are NULL (default), then the manifest variables are assigned a random number of outcomes between two and five. |
x |
a matrix of concomicant variables with N rows and niv columns. If x=NULL (default), but niv>0 , then niv concomitant variables will be generated as mutually independent random draws from a standard normal distribution. |
niv |
number of concomitant variables (covariates). Setting niv=0 (default) creates a data set assuming no covariates. If nclass=1 then niv is automatically set equal to 0. If both x and niv are entered, then the number of columns in x overrides the value of niv . The number of rows in b , less one, also overrides niv . |
b |
when using covariates, an niv+1 by nclass-1 matrix of (multinomial) logit coefficients. If b is NULL (default), then coefficients are generated as random integers between -2 and 2. |
classdist |
a vector of mixing proportions (class population shares) of length nclass . classdist must sum to 1. Disregarded if b is specified or niv>1 because then classdist is, in part, a function of the concomitant variables. If classdist is NULL (default), then the mixing proportions are generated randomly. |
missval |
logical. If TRUE then a fraction pctmiss of the manifest variables are randomly dropped as missing values. Default is FALSE . |
pctmiss |
percentage of values to be dropped as missing, if missval=TRUE . If pctmiss is NULL (default), then a value between 5 and 40 percent is chosen randomly. |
Details
Note that entering probs
overrides nclass
, ndv
, and nresp
. It also overrides classdist
if the length of the classdist
vector is not equal to the length of the probs
list. Likewise, if probs=NULL
, then length(nresp)
overrides ndv
and length(classdist)
overrides nclass
. Setting niv>1
causes any user-entered value of classdist
to be disregarded.
Value
dat |
a data frame containing the simulated variables. Variable names for manifest variables are Y1, Y2, etc. Variable names for concomitant variables are X1, X2, etc. |
probs |
a list of matrices of dimension nclass by nresp containing the class-conditional response probabilities. |
nresp |
a vector containing the number of possible outcomes for each manifest variable. |
b |
coefficients on covariates, if used. |
classdist |
mixing proportions corresponding to each latent class. |
pctmiss |
percent of observations missing. |
trueclass |
N by 1 vector containing the "true" class membership for each individual. |
See Also
poLCA
Examples
##
## Create a sample data set with 3 classes and no covariates
## and run poLCA to recover the specified parameters.
##
probs <- list(matrix(c(0.6,0.1,0.3, 0.6,0.3,0.1, 0.3,0.1,0.6 ),ncol=3,byrow=TRUE), # conditional resp prob to Y1
matrix(c(0.2,0.8, 0.7,0.3, 0.3,0.7 ),ncol=2,byrow=TRUE), # conditional resp prob to Y2
matrix(c(0.3,0.6,0.1, 0.1,0.3,0.6, 0.3,0.6,0.1 ),ncol=3,byrow=TRUE), # conditional resp prob to Y3
matrix(c(0.1,0.1,0.5,0.3, 0.5,0.3,0.1,0.1, 0.3,0.1,0.1,0.5),ncol=4,byrow=TRUE), # conditional resp prob to Y4
matrix(c(0.1,0.1,0.8, 0.1,0.8,0.1, 0.8,0.1,0.1 ),ncol=3,byrow=TRUE)) # conditional resp prob to Y5
simdat <- poLCA.simdata(N=10000,probs,classdist=c(0.2,0.3,0.5))
f1 <- cbind(Y1,Y2,Y3,Y4,Y5)~1
lc1 <- poLCA(f1,simdat$dat,nclass=3)
print(table(lc1$predclass,simdat$trueclass))
##
## Create a sample dataset with 2 classes and three covariates.
## Then compare predicted class memberships when the model is
## estimated "correctly" with covariates to when it is estimated
## "incorrectly" without covariates.
##
simdat2 <- poLCA.simdata(N=5000,ndv=7,niv=3,nclass=2,b=matrix(c(1,-2,1,-1)))
f2a <- cbind(Y1,Y2,Y3,Y4,Y5,Y6,Y7)~X1+X2+X3
lc2a <- poLCA(f2a,simdat2$dat,nclass=2)
f2b <- cbind(Y1,Y2,Y3,Y4,Y5,Y6,Y7)~1
lc2b <- poLCA(f2b,simdat2$dat,nclass=2)
print(table(lc2a$predclass,lc2b$predclass))
##
## Create a sample dataset with missing values and estimate the
## latent class model including and excluding the missing values.
## Then plot the estimated class-conditional outcome response
## probabilities against each other for the two methods.
##
simdat3 <- poLCA.simdata(N=8000,niv=2,ndv=5,nclass=3,b=matrix(c(-1,2,-3,1,-2,2),3,2),missval=TRUE,pctmiss=0.2)
f3 <- cbind(Y1,Y2,Y3,Y4,Y5)~X1+X2
lc3.miss <- poLCA(f3,simdat3$dat,nclass=3,verbose=FALSE)
probs.start.new <- poLCA.reorder(lc3.miss$probs.start,order(lc3.miss$P))
lc3.miss <- poLCA(f3,simdat3$dat,nclass=3,probs.start=probs.start.new)
lc3.nomiss <- poLCA(f3,simdat3$dat,nclass=3,verbose=FALSE,na.rm=FALSE)
probs.start.new <- poLCA.reorder(lc3.nomiss$probs.start,order(lc3.nomiss$P))
lc3.nomiss <- poLCA(f3,simdat3$dat,nclass=3,na.rm=FALSE,probs.start=probs.start.new)
plot(lc3.miss$probs[[1]],lc3.nomiss$probs[[1]],xlim=c(0,1),ylim=c(0,1),
xlab="Conditional response probabilities (missing values dropped)",
ylab="Conditional response probabilities (missing values included)")
for (i in 2:5) { points(lc3.miss$probs[[i]],lc3.nomiss$probs[[i]]) }
abline(0,1,lty=3)
[Package
poLCA version 1.1
Index]