Data for cleaning {epicalc} | R Documentation |
Dataset for practicing cleaning, labelling and recoding
Description
The data come from clients of a family planning clinic.
For all variables except id: 9, 99, 99.9, 888, 999 represent missing values
Usage
data(Planning)
Format
A data frame with 251 observations on the following 11 variables.
ID
- a numeric vector: ID code
AGE
- a numeric vector
RELIG
- a numeric vector: Religion
PED
- a numeric vector: Patient's education level
| 1 | = none |
| 2 | = primary school |
| 3 | = secondary school |
| 4 | = high school |
| 5 | = vocational school |
| 6 | = university |
| 7 | = other |
INCOME
- a numeric vector: Monthly income in Thai Baht
| 1 | = nil |
| 2 | = < 1,000 |
| 3 | = 1,000-4,999 |
| 4 | = 5,000-9,999 |
| 5 | = 10,000 |
AM
- a numeric vector: Age at marriage
REASON
- a numeric vector: Reason for family planning
| 1 | = birth spacing |
| 2 | = enough children |
| 3 | = other |
BPS
- a numeric vector: systolic blood pressure
BPD
- a numeric vector: diastolic blood pressure
WT
- a numeric vector: weight (Kg)
HT
- a numeric vector: height (cm)
Examples
data(Planning)
des(Planning)
# Change var. name to lowercase
names(Planning) <- tolower(names(Planning))
use(Planning)
des()
# Check for duplication of 'id'
any(duplicated(id))
duplicated(id)
id[duplicated(id)] #215
# Which one(s) are missing?
setdiff(min(id):max(id), id) # 216
# Correct the wrong one
id[duplicated(id)] <- 216
[Package
epicalc version 2.8.1.1
Index]