ms.nprev {twostage} | R Documentation |
Weighted logistic regression using the Mean Score method
BACKGROUND
This algorithm will analyse the second stage data from a two-stage
design, incorporating as appropriate weights the first stage sample
sizes in each of the strata defined by the first-stage variables.
If the first-stage sample sizes are unknown, you can still get
estimates (but not standard errors) using estimated relative
frequencies (prevalences)of the strata. To ensure that the sample
sizes or prevalences are provided in the correct order, it is
advisable to first run the coding
function.
ms.nprev(x=x,y=y,z=z,n1="option",prev="option",factor=NULL,print.all=FALSE)
REQUIRED ARGUMENTS
x |
matrix of predictor variables for regression model |
y |
response variable (should be binary 0-1) |
z |
matrix of any surrogate or auxiliary variables which must be categorical, and one of the following: |
n1 |
vector of the first stage sample sizes
for each (y,z) stratum: must be provided
in the correct order (see coding function) OR |
prev |
vector of the first-stage or population
proportions (prevalences) for each (y,z) stratum:
must be provided in the correct order
(see coding function) OPTIONAL ARGUMENTS |
print.all |
logical value determining all output to be printed. The default is FALSE. |
factor |
factor variables; if the columns of the matrix of predictor variables have names, supply these names, otherwise supply the column numbers. MS.NPREV will fit separate coefficients for each level of the factor variables. |
The response, predictor and surrogate variables
have to be numeric. If you have multiple columns of
z, say (z1,z2,..zn), these will be recoded into
a single vector new.z
z1 | z2 | z3 | new.z |
0 | 0 | 0 | 1 |
1 | 0 | 0 | 2 |
0 | 1 | 0 | 3 |
1 | 1 | 0 | 4 |
0 | 0 | 1 | 5 |
1 | 0 | 1 | 6 |
0 | 1 | 1 | 7 |
1 | 1 | 1 | 8 |
If some of the value combinations do not exist in your data, the function will adjust accordingly. For example if the combination (0,1,1) is absent, then (1,1,1) will be coded as 7.
If called with prev
will return only:
A list called table
containing the following:
ylevel |
the distinct values (or levels) of y |
zlevel |
the distinct values (or levels) of z |
prev |
the prevalences for each (y,z) stratum |
n2 |
the sample sizes at the second stage in each stratum
defined by (y,z) and a list called parameters containing: |
est |
the Mean score estimates of the coefficients in the
logistic regression model If called with n1 it will return:
a list called table containing: |
ylevel |
the distinct values (or levels) of y |
zlevel |
the distinct values (or levels) of z |
n1 |
the sample size at the first stage in each (y,z) stratum |
n2 |
the sample sizes at the second stage in each stratum
defined by (y,z) and a list called parameters containing: |
est |
the Mean score estimates of the coefficients in the logistic regression model |
se |
the standard errors of the Mean Score estimates |
z |
Wald statistic for each coefficient |
pvalue |
2-sided p-value (H0: coeff=0) If print.all=T, the following lists will also be returned: |
Wzy |
the weight matrix used by the mean score algorithm, for each Y,Z stratum: this will be in the same order as n1 and prev |
varsi |
the variance of the score in each Y,Z stratum |
Ihat |
the Fisher information matrix |
Reilly,M and M.S. Pepe. 1995. A mean score method for
missing and auxiliary covariate data in
regression models. Biometrika 82:299-314
Reilly,M. 1996. Optimal sampling strategies for two-stage studies. Amer. J. Epidemiol. 143:92-100
fixed.n
,budget
,precision
coding
,cass1
,cass2
## Not run: As an illustrative example, we use the CASS pilot data,"cass1" from Reilly (1996) Use ## End(Not run) data(cass1) #to load the data ## Not run: and help(cass1) #for details ## Not run: The first-stage sample sizes are: Y Z n 0 0 6666 0 1 1228 1 0 144 1 1 58 An analysis of the pilot data using Mean Score## End(Not run) # supply the first stage sample sizes in the correct order n1=c(6666, 1228, 144, 58) ms.nprev(y=cass1[,1], x=cass1[,2:3],z=cass1[,3],n1=n1) ## Not run: gives the results: [1] "please run coding function to see the order in which you" [1] "must supply the first-stage sample sizes or prevalences" [1] " Type ?coding for details!" [1] "For calls requiring n1 or prev as input, use the following order" ylevel z new.z n2 [1,] 0 0 0 25 [2,] 0 1 1 25 [3,] 1 0 0 25 [4,] 1 1 1 25 [1] "Check sample sizes/prevalences" $table ylevel zlevel n1 n2 [1,] 0 0 6666 25 [2,] 0 1 1228 25 [3,] 1 0 144 25 [4,] 1 1 58 25 $parameters est se z pvalue (Intercept) -5.06286163 1.46495235 -3.455991 0.0005482743 age 0.02166536 0.02584049 0.838427 0.4017909402 sex 0.67381300 0.21807878 3.089769 0.0020031236 ## End(Not run) ## Not run: Note that the Mean Score algorithm produces smaller standard errors of estimates than the complete-case analysis, due to the additional information in the incomplete cases. ## End(Not run)