ms.nprev {meanscore} | R Documentation |
Weighted logistic regression using the Mean Score method
BACKGROUND
This algorithm will analyse the second stage data from a two-stage
design, incorporating as appropriate weights the first stage sample
sizes in each of the strata defined by the first-stage variables.
If the first-stage sample sizes are unknown, you can still get
estimates (but not standard errors) using estimated relative
frequencies (prevalences)of the strata. To ensure that the sample
sizes or prevalences are provided in the correct order, it is
advisable to first run the coding
function.
ms.nprev(x=x,y=y,z=z,n1="option",prev="option",factor=NULL,print.all=FALSE)
REQUIRED ARGUMENTS
x |
matrix of predictor variables for regression model |
y |
response variable (should be binary 0-1) |
z |
matrix of any surrogate or auxiliary variables which must be categorical , and one of the following: |
n1 |
vector of the first stage sample sizes
for each (y,z) stratum: must be provided
in the correct order (see coding function) OR |
prev |
vector of the first-stage or population
proportions (prevalences) for each (y,z) stratum:
must be provided in the correct order
(see coding function) OPTIONAL ARGUMENTS |
print.all |
logical value determining all output to be printed. The default is False (F). |
factor |
factor variables; if the columns of the matrix of predictor variables have names, supply these names, otherwise supply the column numbers. MS.NPREV will fit separate coefficients for each level of the factor variables. |
The response, predictor and surrogate variables
have to be numeric. If you have multiple columns of
z, say (z1,z2,..zn), these will be recoded into
a single vector new.z
z1 | z2 | z3 | new.z |
0 | 0 | 0 | 1 |
1 | 0 | 0 | 2 |
0 | 1 | 0 | 3 |
1 | 1 | 0 | 4 |
0 | 0 | 1 | 5 |
1 | 0 | 1 | 6 |
0 | 1 | 1 | 7 |
1 | 1 | 1 | 8 |
If some of the value combinations do not exist in your data, the function will adjust accordingly. For example if the combination (0,1,1) is absent, then (1,1,1) will be coded as 7.
If called with prev
will return only:
A list called "table" containing the following:
ylevel |
the distinct values (or levels) of y |
zlevel |
the distinct values (or levels) of z |
prev |
the prevalences for each (ylevel,zlevel) stratum |
n2 |
the sample sizes at the second stage in each stratum
defined by (ylevel,zlevel) and a list called "parameters" containing: |
est |
the Mean score estimates of the coefficients in the
logistic regression model If called with n1 it will return:
a list called "table" containing: |
ylevel |
the distinct values (or levels) of y |
zlevel |
the distinct values (or levels) of z |
n1 |
the sample size at the first stage in each (ylevel,zlevel) stratum |
n2 |
the sample sizes at the second stage in each stratum
defined by (ylevel,zlevel) and a list called "parameters" containing: |
est |
the Mean score estimates of the coefficients in the logistic regression model |
se |
the standard errors of the Mean Score estimates |
z |
Wald statistic for each coefficient |
pvalue |
2-sided p-value (H0: coeff=0) If print.all=TRUE, the following lists will also be returned: |
Wzy |
the weight matrix used by the mean score algorithm,
for each (ylevel,zlevel) stratum: this will be in the same order
as n1 and prev |
varsi |
the variance of the score in each (ylevel,zlevel) stratum |
Ihat |
the Fisher information matrix |
Reilly,M and M.S. Pepe. 1995. A mean score method for missing and auxiliary covariate data in regression models. Biometrika 82:299-314
meanscore
,coding
,
ectopic
,simNA
,glm
.
## Not run: As an illustrative example, we use a simulated data set, simNA. Use ## End(Not run) data(simNA) #to load the data ## Not run: and help(simNA) #for details ## Not run: The "complete cases" (i.e. second-stage data) can be extracted by: complete=simNA[!is.na(simNA[,3]),] ## Not run: Running a logistic regression analysis on the complete data: summary(glm(complete[,1]~complete[,3], family="binomial")) ## Not run: gives the following result Call: glm(formula = complete[, 1] ~ complete[, 3], family = "binomial") Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.05258 0.09879 0.532 0.595 complete[, 3] 1.01942 0.12050 8.460 <2e-16 *** ## End(Not run) ## Not run: The first and second stage sample sizes can be viewed by running the "coding" function (see help(coding) for details) ## End(Not run) coding(x=simNA[,3], y=simNA[,1], z=simNA[,2]) ## Not run: which gives the following: [1] "For calls to ms.nprev, input n1 or prev in the following order!!" ylevel z new.z n1 n2 [1,] 0 0 0 310 150 [2,] 0 1 1 166 85 [3,] 1 0 0 177 86 [4,] 1 1 1 347 179 ## End(Not run) ## Not run: An analysis of all first- and second-stage data using Mean Score: # supply the first stage sample sizes in the correct order n1=c(310,166,177,347) ms.nprev(x=complete[,3],z=complete[,2],y=complete[,1],n1=n1) ## Not run: gives the results: [1] "please run coding function to see the order in which you" [1] "must supply the first-stage sample sizes or prevalences" [1] " Type ?coding for details!" [1] "For calls to ms.nprev,input n1 or prev in the following order!!" ylevel z new.z n2 [1,] 0 0 0 150 [2,] 0 1 1 85 [3,] 1 0 0 86 [4,] 1 1 1 179 [1] "Check sample sizes/prevalences" $table ylevel zlevel n1 n2 [1,] 0 0 310 150 [2,] 0 1 166 85 [3,] 1 0 177 86 [4,] 1 1 347 179 $parameters est se z pvalue (Intercept) 0.0493998 0.07155138 0.6904103 0.4899362 x 1.0188437 0.10187094 10.0013188 0.0000000 ## End(Not run) ## Not run: If we supply the prevalances instead of first stage sample sizes p1=c(310,166,177,347)/1000 ms.nprev(x=complete[,3],z=complete[,2],y=complete[,1],prev=p1) ## Not run: we get the output: ylevel zlevel prev n2 [1,] 0 0 0.310 150 [2,] 0 1 0.166 85 [3,] 1 0 0.177 86 [4,] 1 1 0.347 179 $parameters est (Intercept) 0.04939797 x 1.01885599 ## End(Not run) ## Not run: Note that the Mean Score algorithm produces smaller standard errors of estimates than the complete-case analysis, due to the additional information in the incomplete cases. ## End(Not run)