Income {arules} | R Documentation |
The Income
data set originates from an example in the book
‘The Elements of Statistical Learning’ (see Section source).
The dataset is an extract from this survey. It consists of 8993
instances (obtained from the original dataset with 9409 instances, by
removing those observations with the annual income missing) with 14
demographic attributes. The data set is a good mixture of categorical
and continuous variables with a lot of missing data. This is
characteristic of data mining applications.
data("Income") data("Income_orig")
A data frame with 8993 observations on the following 14 variables.
$0-$40,000
,
$40,000+
male
, female
married
,
cohabitation
, divorced
, widowed
,
single
14-34
, 35+
no college graduate
,
college graduate
professional/managerial
, sales
, laborer
,
clerical/service
, homemaker
, student
,
military
, retired
, unemployed
1-9
,
10+
not married
,
yes
, no
1
, 2+
0
, 1+
own
,
rent
, live with parents/family
house
,
condominium
, apartment
, mobile Home
,
other
american
indian
, asian
, black
, east indian
,
hispanic
, pacific islander
, white
,
other
english
,
spanish
, other
The original data frame is available as data set
Income_original
. For the Income
data set we
preprocessed the data as described in ‘The Elements of
Statistical Learning’ by cutting each ordinal variable (age,
education, income, years in bay area, number in houshold, and number
of children) at its median into two values.
Impact Resources, Inc., Columbus, OH (1987).
Obtained from the web site of the book: Hastie, T., Tibshirani, R. & Friedman, J. (2001). The Elements of Statistical Learning. Springer-Verlag. (http://www-stat.stanford.edu/~tibs/ElemStatLearn/; called ‘Marketing’)
data("Income") Income_transactions <- as(Income, "transactions") summary(Income_transactions)