Keep data {epicalc} | R Documentation |
Keep only subset of variables or records in the default data frame '.data'
keepData (dataFrame = .data, sample=NULL, exclude=NULL, subset, select, drop = FALSE, ...)
dataFrame |
a data frame |
sample |
an integer indicating the size of random sample or a value < 1 indicating fraction of records to be extracted |
exclude |
an expression, indicating columns to remove from '.data'. |
subset |
a logical expression indicating elements or rows to keep: missing values are taken as false. |
select |
an expression indicating columns to select from a data frame. |
drop |
passed on to [ indexing operator. |
... |
further arguments to be passed to or from other methods. |
'keepData' is the Epicalc version of 'subset.data.frame' which is a standard R function. It reduces the data frame to the specified subset and updates the search path accordingly.
Using 'keepData' will retain descriptions of the data, and the remaining variables, ready to be used by other Epicalc functions that can exploit them such as 'des', 'codebook', 'summ', 'tab1' etc ...
Since this command only affects the specified data frame (usually '.data'), any new variables created as free vectors will not be changed. The difference in length of variables may occur from the 'subset' argument. To avoid this, 'pack' or 'label.var' should be used to incoporate any relevant free vectors into the default data frame, '.data' so that all variables can be subsetted simultaneously, thus reducing the complications of the difference in variable lengths.
Virasakdi Chongsuvivatwong <cvirasak@medicine.psu.ac.th>
'des', 'subset', 'sample'
## Record sampling data(ANCdata) use(ANCdata) des() keepData(sample=500) des() # Note reduction of sample size to 500 use(ANCdata) keepData(sample=.1) # Select 10 des() ## Selecting specific record numbers data(Compaq) use(Compaq) keepData(subset = 1:nrow(.data) <= 50) #First 50 records summ() use(Compaq) every.seventh <- is.element(1:nrow(.data), seq(1, nrow(.data), 7)) keepData(subset = every.seventh) head(.data) ## Selecting records under certain conditions data(Familydata) use(Familydata) des() .data bmi <- wt/(ht/100)^2 label.var(bmi, "Body mass index (kg/m2)") keepData(subset = ht > 120) .data # Which record is missing? ## Reduction of variables ## Removal of consecutive variables use(Familydata) keepData(select = -(age:ht)) # Variables from 'age' to 'ht' removed des() ## A better alternative would be: use(Familydata) keepData(exclude = age:ht) des() keepData(select = -c(1,3)) # Further removal of the first and # the third variables des() codebook() ## Targeting only a certain variables data(Oswego) use(Oswego) des() keepData(select = c(age, sex, ill, cakes:fruitsalad)) des() keepData(select = c(1,2,5:7)) # Retain all variables except the third #the the fourth des() # Note the repetition of '(subset)' ## Wildcard use(Oswego) des() keepData(select = "c*") # The wildcard must be enclosed in quotes des() use(Oswego) des() keepData(exclude = "on*") # Variables having names starting with "on" removed keepData(exclude = "???") # Variables having names with 3 characters removed des() # Which are missing?