coding {twostage}R Documentation

combines two or more surrogate/auxiliary variables into a vector

Description

recodes a matrix of categorical variables into a vector which takes a unique value for each combination

BACKGROUND

From the matrix Z of first-stage covariates, this function creates a vector which takes a unique value for each combination as follows:

z1 z2 z3 new.z
0 0 0 1
1 0 0 2
0 1 0 3
1 1 0 4
0 0 1 5
1 0 1 6
0 1 1 7
1 1 1 8

If some of the combinations do not exist, the function will adjust accordingly: for example if the combination (0,1,1) is absent above, then (1,1,1) will be coded as 7.

The values of this new.z are reported as new.z in the printed output (see value below)

This function should be run on second stage data prior to using the ms.nprev function, as it illustrates the order in which the call to ms.nprev expects the first-stage sample sizes to be provided.

Usage

coding(x=x,y=y,z=z,return=FALSE)

Arguments

REQUIRED ARGUMENTS

x matrix of predictor variables for regression model
y response variable (should be binary 0-1)
z matrix of any surrogate or auxiliary variables which must be categorical

OPTIONAL ARGUMENTS
return logical value; if it's TRUE(T) the original surrogate or auxiliary variables and the re-coded auxilliary variables will be returned. The default is FALSE.

Value

This function does not return any values except if return=T.

If used with only second stage (i.e. complete) data, it will print the following:

ylevel the distinct values (or levels) of response variable
z1 ... zi the distinct values of first stage variables z1 ... zi
new.z recoded first stage variables. Each value represents a unique combination of first stage variable values.
n2 second stage sample sizes in each (ylevel,new.z) stratum.

If used with combined first and second stage data (i.e. with NA for missing values), in addition to the above items, the function will also print the following:
n1 first-stage sample sizes in each (ylevel,new.z) stratum.

References

Reilly,M. 1996. Optimal sampling strategies for two-stage studies. Amer. J. Epidemiol. 143:92-100

See Also

ms.nprev,fixed.n,budget precision, cass1,cass2

Examples


## Not run: The CASS2 data set in Reilly (1996) has 2 categorical first-stage 
variables in columns 2 (sex) and 3 (categorical weight). The predictor 
variables are  column 2 (sex) and columns 4-9 and the response variable 
is in column 1 (mort). See help(cass2) for further details. 

The commands## End(Not run)
data(cass2)
coding(x=cass2[,c(2,4:9)],y=cass2[,1], z=cass2[,2:3])

## Not run: give the following coding scheme and first-stage and second-stage 
sample sizes (n1 and n2 respectively)

[1] "For calls requiring n1 or prev as input, use the following order"
      ylevel sex wtcat new.z n2
 [1,]      0   0     1     1 10
 [2,]      0   1     1     2 10
 [3,]      0   0     2     3 10
 [4,]      0   1     2     4 10
 [5,]      0   0     3     5 10
 [6,]      0   1     3     6 10
 [7,]      1   0     1     1  8
 [8,]      1   1     1     2 10
 [9,]      1   0     2     3 10
[10,]      1   1     2     4 10
[11,]      1   0     3     5 10
[12,]      1   1     3     6 10
## End(Not run)

[Package Contents]