coding {twostage} | R Documentation |
recodes a matrix of categorical variables into a vector which takes
a unique value for each combination
BACKGROUND
From the matrix Z of first-stage covariates, this function creates a vector which takes a unique value for each combination as follows:
z1 | z2 | z3 | new.z |
0 | 0 | 0 | 1 |
1 | 0 | 0 | 2 |
0 | 1 | 0 | 3 |
1 | 1 | 0 | 4 |
0 | 0 | 1 | 5 |
1 | 0 | 1 | 6 |
0 | 1 | 1 | 7 |
1 | 1 | 1 | 8 |
If some of the combinations do not exist, the function will adjust
accordingly: for example if the combination (0,1,1) is absent above,
then (1,1,1) will be coded as 7.
The values of this new.z are reported as new.z
in the printed output
(see value
below)
This function should be run on second stage data prior to using the ms.nprev function, as it illustrates the order in which the call to ms.nprev expects the first-stage sample sizes to be provided.
coding(x=x,y=y,z=z,return=FALSE)
REQUIRED ARGUMENTS
x |
matrix of predictor variables for regression model |
y |
response variable (should be binary 0-1) |
z |
matrix of any surrogate or auxiliary variables which must be categorical OPTIONAL ARGUMENTS |
return |
logical value; if it's TRUE(T) the original surrogate or auxiliary variables and the re-coded auxilliary variables will be returned. The default is FALSE. |
This function does not return any values except if return
=T.
If used with only second stage (i.e. complete) data, it will print the
following:
ylevel |
the distinct values (or levels) of response variable |
z1 ... zi |
the distinct values of first stage variables z1 ... zi |
new.z |
recoded first stage variables. Each value represents a unique combination of first stage variable values. |
n2 |
second stage sample sizes in each (ylevel ,new.z ) stratum. If used with combined first and second stage data (i.e. with NA for missing values), in addition to the above items, the function will also print the following: |
n1 |
first-stage sample sizes in each (ylevel ,new.z ) stratum. |
Reilly,M. 1996. Optimal sampling strategies for two-stage studies. Amer. J. Epidemiol. 143:92-100
ms.nprev
,fixed.n
,budget
precision
, cass1
,cass2
## Not run: The CASS2 data set in Reilly (1996) has 2 categorical first-stage variables in columns 2 (sex) and 3 (categorical weight). The predictor variables are column 2 (sex) and columns 4-9 and the response variable is in column 1 (mort). See help(cass2) for further details. The commands## End(Not run) data(cass2) coding(x=cass2[,c(2,4:9)],y=cass2[,1], z=cass2[,2:3]) ## Not run: give the following coding scheme and first-stage and second-stage sample sizes (n1 and n2 respectively) [1] "For calls requiring n1 or prev as input, use the following order" ylevel sex wtcat new.z n2 [1,] 0 0 1 1 10 [2,] 0 1 1 2 10 [3,] 0 0 2 3 10 [4,] 0 1 2 4 10 [5,] 0 0 3 5 10 [6,] 0 1 3 6 10 [7,] 1 0 1 1 8 [8,] 1 1 1 2 10 [9,] 1 0 2 3 10 [10,] 1 1 2 4 10 [11,] 1 0 3 5 10 [12,] 1 1 3 6 10 ## End(Not run)