Adult {arules} | R Documentation |
The Adult
data set contains the questionnaire data of the
“Adult” database (originally called the “Census Income”
Database) formatted as a data.frame
prepared for use with
arules. The Adult_transactions
data set contains the data
already coerced to transactions
.
data("Adult") data("Adult_transactions")
Adult
contains a data.frame
with 48842 observations on
the following 14 variables:
middle-aged
, old
,
senior
, young
Federal-gov
,
Local-gov
, Never-worked
, Private
,
Self-emp-inc
, Self-emp-not-inc
, State-gov
,
Without-pay
10th
, 11th
,
12th
, 1st-4th
, 5th-6th
, 7th-8th
,
9th
, Assoc-acdm
, Assoc-voc
, Bachelors
,
Doctorate
, HS-grad
, Masters
,
Preschool
, Prof-school
, Some-college
1
, 10
,
11
, 12
, 13
, 14
, 15
, 16
,
2
, 3
, 4
, 5
, 6
, 7
,
8
, 9
Divorced
,
Married-AF-spouse
, Married-civ-spouse
,
Married-spouse-absent
, Never-married
,
Separated
, Widowed
Adm-clerical
,
Armed-Forces
, Craft-repair
, Exec-managerial
,
Farming-fishing
, Handlers-cleaners
,
Machine-op-inspct
, Other-service
,
Priv-house-serv
, Prof-specialty
,
Protective-serv
, Sales
, Tech-support
,
Transport-moving
Husband
,
Not-in-family
, Other-relative
, Own-child
,
Unmarried
, Wife
Amer-Indian-Eskimo
,
Asian-Pac-Islander
, Black
, Other
,
White
Female
, Male
high
, medium
,
none
, small
medium
, none
,
small
full-time
,
half-time
, overtime
, too-many
Cambodia
,
Canada
, China
, Columbia
, Cuba
,
Dominican-Republic
, Ecuador
, El-Salvador
,
England
, France
, Germany
, Greece
,
Guatemala
, Haiti
, Holand-Netherlands
,
Honduras
, Hong
, Hungary
, India
,
Iran
, Ireland
, Italy
, Jamaica
,
Japan
, Laos
, Mexico
, Nicaragua
,
Outlying-US(Guam-USVI-etc)
, Peru
,
Philippines
, Poland
, Portugal
,
Puerto-Rico
, Scotland
, South
, Taiwan
,
Thailand
, Trinadad&Tobago
, United-States
,
Vietnam
, Yugoslavia
small
, large
The “Adult” database was extracted from the census bureau database found at http://www.census.gov/ftp/pub/DES/www/welcome.html in 1994 by Ronny Kohavi and Barry Becker, Data Mining and Visualization, Silicon Graphics. It was originally used to predict whether income exceeds $50K/yr based on census data.
To prepare the data set for association mining, we removed the
continuous attribute fnlwgt
(final weight) and added the
attribute salary
with levels ‘small’ and ‘large’
($>$$50K/yr). The original data contained 5 more continuous
attributes (age
, education-num
, capital-gain
,
capital-loss
and hours-per-week
) which we coded using
discrete values.
http://www.ics.uci.edu/~mlearn/MLRepository.html
Blake, C.L. & Merz, C.J. (1998). UCI Repository of Machine Learning Databases. Irvine, CA: University of California, Department of Information and Computer Science.
The data set was first cited in Kohavi, R. (1996). Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining.
data("Adult") dim(Adult) Adult[1:2, 1:4] data("Adult_transactions") Adult_transactions