Income {arules}R Documentation

Income Data Set

Description

The Income data set originates from an example in the book ‘The Elements of Statistical Learning’ (see Section source). The dataset is an extract from this survey. It consists of 8993 instances (obtained from the original dataset with 9409 instances, by removing those observations with the annual income missing) with 14 demographic attributes. The data set is a good mixture of categorical and continuous variables with a lot of missing data. This is characteristic of data mining applications.

Usage

data("Income")
data("Income_orig")

Format

A data frame with 8993 observations on the following 14 variables.

income
a factor with levels $0-$40,000, $40,000+
sex
a factor with levels male, female
marital status
a factor with levels married, cohabitation, divorced, widowed, single
age
a factor with levels 14-34, 35+
education
a factor with levels no college graduate, college graduate
occupation
a factor with levels professional/managerial, sales, laborer, clerical/service, homemaker, student, military, retired, unemployed
years in bay area
a factor with levels 1-9, 10+
dual incomes
a factor with levels not married, yes, no
number in household
a factor with levels 1, 2+
number of children
a factor with levels 0, 1+
householder status
a factor with levels own, rent, live with parents/family
type of home
a factor with levels house, condominium, apartment, mobile Home, other
ethnic classification
a factor with levels american indian, asian, black, east indian, hispanic, pacific islander, white, other
language in home
a factor with levels english, spanish, other

Details

The original data frame is available as data set Income_original. For the Income data set we preprocessed the data as described in ‘The Elements of Statistical Learning’ by cutting each ordinal variable (age, education, income, years in bay area, number in houshold, and number of children) at its median into two values.

Source

Impact Resources, Inc., Columbus, OH (1987).

Obtained from the web site of the book: Hastie, T., Tibshirani, R. & Friedman, J. (2001). The Elements of Statistical Learning. Springer-Verlag. (http://www-stat.stanford.edu/~tibs/ElemStatLearn/; called ‘Marketing’)

Examples

data("Income")

Income_transactions <- as(Income, "transactions")
summary(Income_transactions)

[Package arules version 0.1-0 Index]