Income {arules} | R Documentation |
The Income
data set originates from an example in the book
‘The Elements of Statistical Learning’ (see Section source).
The data set is an extract from this survey. It consists of 8993
instances (obtained from the original data set with 9409 instances, by
removing those observations with the annual income missing) with 14
demographic attributes. The data set is a good mixture of categorical
and continuous variables with a lot of missing data. This is
characteristic of data mining applications.
The Income_transactions
data set contains the data
already prepared and coerced to
transactions
.
data("Income") data("Income_transactions")
Adult
is a data frame with 8993 observations on the
following 14 variables.
[0,10)
< [10,15)
< [15,20)
< [20,25)
< [25,30)
< [30,40)
< [40,50)
< [50,75)
< 75+
male
female
married
cohabitation
divorced
widowed
single
14-17
< 18-24
< 25-34
< 35-44
< 45-54
< 55-64
< 65+
grade <9
< grades 9-11
< high school graduate
< college (1-3 years)
< college graduate
< graduate study
professional/managerial
sales
laborer
clerical/service
homemaker
student
military
retired
unemployed
<1
< 1-3
< 4-6
< 7-10
< >10
not married
yes
no
1
< 2
< 3
< 4
< 5
< 6
< 7
< 8
< 9+
0
< 1
< 2
< 3
< 4
< 5
< 6
< 7
< 8
< 9+
own
rent
live with parents/family
house
condominium
apartment
mobile Home
other
american indian
asian
black
east indian
hispanic
pacific islander
white
other
english
spanish
other
To create Income_transactions
, the original data frame in
Income
prepared in a similar way as described in ‘The Elements
of Statistical Learning.’ We cut each ordinal variable (age, education,
income, years in bay area, number in household, and number of children)
roughly at its median into two values (see Section examples).
Impact Resources, Inc., Columbus, OH (1987).
Obtained from the web site of the book: Hastie, T., Tibshirani, R. & Friedman, J. (2001). The Elements of Statistical Learning. Springer-Verlag. (http://www-stat.stanford.edu/~tibs/ElemStatLearn/; called ‘Marketing’)
data("Income") Income[1:3, ] ### preparing the data set Income[["income"]] <- factor((as.numeric(Income[["income"]]) > 6) +1, levels = 1 : 2 , labels = c("$0-$40,000", "$40,000+")) Income[["age"]] <- factor((as.numeric(Income[["age"]]) > 3) +1, levels = 1 : 2 , labels = c("14-34", "35+")) Income[["education"]] <- factor((as.numeric(Income[["education"]]) > 4) +1, levels = 1 : 2 , labels = c("no college graduate", "college graduate")) Income[["years in bay area"]] <- factor( (as.numeric(Income[["years in bay area"]]) > 4) +1, levels = 1 : 2 , labels = c("1-9", "10+")) Income[["number in household"]] <- factor( (as.numeric(Income[["number in household"]]) > 1) +1, levels = 1 : 2 , labels = c("1", "2+")) Income[["number of children"]] <- factor( (as.numeric(Income[["number of children"]]) > 1) +1, levels = 1 : 2 , labels = c("0", "1+")) ## creating transactions Income_transactions <- as(Income, "transactions") Income_transactions