ms.nprev {meanscore}R Documentation

Logistic regression of two-stage data using second stage sample and first stage sample sizes or proportions (prevalences) as input

Description

Weighted logistic regression using the Mean Score method

BACKGROUND

This algorithm will analyse the second stage data from a two-stage design, incorporating as appropriate weights the first stage sample sizes in each of the strata defined by the first-stage variables. If the first-stage sample sizes are unknown, you can still get estimates (but not standard errors) using estimated relative frequencies (prevalences)of the strata. To ensure that the sample sizes or prevalences are provided in the correct order, it is advisable to first run the coding function.

Usage


ms.nprev(x=x,y=y,z=z,n1="option",prev="option",factor=NULL,print.all=FALSE)

Arguments

REQUIRED ARGUMENTS

x matrix of predictor variables for regression model
y response variable (should be binary 0-1)
z matrix of any surrogate or auxiliary variables which must be categorical ,

and one of the following:
n1 vector of the first stage sample sizes for each (y,z) stratum: must be provided in the correct order (see coding function)
OR
prev vector of the first-stage or population proportions (prevalences) for each (y,z) stratum: must be provided in the correct order (see coding function)

OPTIONAL ARGUMENTS
print.all logical value determining all output to be printed. The default is False (F).
factor factor variables; if the columns of the matrix of predictor variables have names, supply these names, otherwise supply the column numbers. MS.NPREV will fit separate coefficients for each level of the factor variables.

Details

The response, predictor and surrogate variables have to be numeric. If you have multiple columns of z, say (z1,z2,..zn), these will be recoded into a single vector new.z

z1 z2 z3 new.z
0 0 0 1
1 0 0 2
0 1 0 3
1 1 0 4
0 0 1 5
1 0 1 6
0 1 1 7
1 1 1 8

If some of the value combinations do not exist in your data, the function will adjust accordingly. For example if the combination (0,1,1) is absent, then (1,1,1) will be coded as 7.

Value

If called with prev will return only:
A list called "table" containing the following:

ylevel the distinct values (or levels) of y
zlevel the distinct values (or levels) of z
prev the prevalences for each (ylevel,zlevel) stratum
n2 the sample sizes at the second stage in each stratum defined by (ylevel,zlevel)

and a list called "parameters" containing:
est the Mean score estimates of the coefficients in the logistic regression model


If called with n1 it will return:
a list called "table" containing:
ylevel the distinct values (or levels) of y
zlevel the distinct values (or levels) of z
n1 the sample size at the first stage in each (ylevel,zlevel) stratum
n2 the sample sizes at the second stage in each stratum defined by (ylevel,zlevel)

and a list called "parameters" containing:
est the Mean score estimates of the coefficients in the logistic regression model
se the standard errors of the Mean Score estimates
z Wald statistic for each coefficient
pvalue 2-sided p-value (H0: coeff=0)


If print.all=TRUE, the following lists will also be returned:
Wzy the weight matrix used by the mean score algorithm, for each (ylevel,zlevel) stratum: this will be in the same order as n1 and prev
varsi the variance of the score in each (ylevel,zlevel) stratum
Ihat the Fisher information matrix

References

Reilly,M and M.S. Pepe. 1995. A mean score method for missing and auxiliary covariate data in regression models. Biometrika 82:299-314

See Also

meanscore,coding, ectopic,simNA,glm.

Examples


## Not run: 
As an illustrative example, we use a simulated data set, simNA.
Use
## End(Not run) 

data(simNA)        #to load the data
## Not run: and
help(simNA)        #for details

## Not run: The "complete cases" (i.e. second-stage data) can be extracted by:

complete=simNA[!is.na(simNA[,3]),]

## Not run: Running a logistic regression analysis on the complete data:

summary(glm(complete[,1]~complete[,3], family="binomial"))

## Not run: gives the following result

Call:
glm(formula = complete[, 1] ~ complete[, 3], family = "binomial")

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    0.05258    0.09879   0.532    0.595    
complete[, 3]  1.01942    0.12050   8.460   <2e-16 ***
## End(Not run)

## Not run: 
The first and second stage sample sizes can be viewed by running
the "coding" function (see help(coding) for details)
## End(Not run)

coding(x=simNA[,3], y=simNA[,1], z=simNA[,2])
## Not run: which gives the following:

 [1] "For calls to ms.nprev, input n1 or prev in the following order!!"
     ylevel z new.z  n1  n2
[1,]      0 0     0 310 150
[2,]      0 1     1 166  85
[3,]      1 0     0 177  86
[4,]      1 1     1 347 179
## End(Not run)

## Not run: An analysis of all first- and second-stage data using Mean Score:

# supply the first stage sample sizes in the correct order
n1=c(310,166,177,347)
ms.nprev(x=complete[,3],z=complete[,2],y=complete[,1],n1=n1)

## Not run: gives the results:
[1] "please run coding function to see the order in which you"
[1] "must supply the first-stage sample sizes or prevalences"
[1] " Type ?coding for details!"
[1] "For calls to ms.nprev,input n1 or prev in the following order!!"
     ylevel z new.z  n2
[1,]      0 0     0 150
[2,]      0 1     1  85
[3,]      1 0     0  86
[4,]      1 1     1 179
[1] "Check sample sizes/prevalences"
$table
     ylevel zlevel  n1  n2
[1,]      0      0 310 150
[2,]      0      1 166  85
[3,]      1      0 177  86
[4,]      1      1 347 179

$parameters
                  est         se          z    pvalue
(Intercept) 0.0493998 0.07155138  0.6904103 0.4899362
x           1.0188437 0.10187094 10.0013188 0.0000000
## End(Not run)

## Not run: If we supply the prevalances instead of first stage sample sizes
p1=c(310,166,177,347)/1000
ms.nprev(x=complete[,3],z=complete[,2],y=complete[,1],prev=p1)

## Not run: we get the output:

      ylevel zlevel  prev  n2
[1,]      0      0 0.310 150
[2,]      0      1 0.166  85
[3,]      1      0 0.177  86
[4,]      1      1 0.347 179

$parameters
                   est
(Intercept) 0.04939797
x           1.01885599
## End(Not run)

## Not run: 
Note that the Mean Score algorithm produces smaller 
standard errors of estimates than the complete-case
analysis, due to the additional information in the
incomplete cases.
## End(Not run)

[Package Contents]