postStratify {survey} | R Documentation |
Post-stratification adjusts the sampling and replicate weights so that
the joint distribution of a set of post-stratifying variables matches
the known population joint distribution. Use rake
when
the full joint distribution is not available.
postStratify(design, strata, population, partial = FALSE, ...) ## S3 method for class 'svyrep.design': postStratify(design, strata, population, partial = FALSE, compress=NULL,...) ## S3 method for class 'survey.design': postStratify(design, strata, population, partial = FALSE, ...)
design |
A survey design with replicate weights |
strata |
A formula or data frame of post-stratifying variables |
population |
A table , xtabs or data.frame
with population frequencies |
partial |
if TRUE , ignore population strata not present in
the sample |
compress |
Attempt to compress the replicate weight matrix? When
NULL will attempt to compress if the original weight matrix
was compressed |
... |
arguments for future expansion |
The population
totals can be specified as a table with the
strata variables in the margins, or as a data frame where one column
lists frequencies and the other columns list the unique combinations
of strata variables (the format produced by as.data.frame
acting on a table
object). A table must have named dimnames
to indicate the variable names.
Compressing the replicate weights will take time and may even increase memory use if there is actually little redundancy in the weight matrix (in particular if the post-stratification variables have many values and cut across PSUs).
If a svydesign
object is to be converted to a replication
design the post-stratification should be performed after conversion.
The variance estimate for replication designs follows the same
procedure as Valliant (1993) described for estimating totals. I
believe the variance estimate for svydesign
objects is the same
as that of Valliant (1993) applied to the estimating functions, and
thus the same as Rao et al (2002), at least in the case where the full
joint distribution of the stratifying variables is available.
A new survey design object.
If the sampling weights are already post-stratified there will be no
change in point estimates after postStratify
but the standard
error estimates will decrease to correctly reflect the post-stratification.
See http://www.dcs.napier.ac.uk/peas/exemplar1.htm for an example.
Valliant R (1993) Post-stratification and conditional variance estimation. JASA 88: 89-96
Rao JNK, Yung W, Hidiroglou MA (2002) Estimating equations for the analysis of survey data using poststratification information. Sankhya 64 Series A Part 2, 364-378.
as.svrepdesign
, svrepdesign
,
rake
, compressWeights
data(api) dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc) rclus1<-as.svrepdesign(dclus1) svymean(~api00, rclus1) svytotal(~enroll, rclus1) # post-stratify on school type pop.types <- data.frame(stype=c("E","H","M"), Freq=c(4421,755,1018)) #or: pop.types <- xtabs(~stype, data=apipop) #or: pop.types <- table(stype=apipop$stype) rclus1p<-postStratify(rclus1, ~stype, pop.types) summary(rclus1p) svymean(~api00, rclus1p) svytotal(~enroll, rclus1p) ## and for svydesign objects dclus1p<-postStratify(dclus1, ~stype, pop.types) summary(dclus1p) svymean(~api00, dclus1p) svytotal(~enroll, dclus1p)