Adult {arules}R Documentation

Adult Data Set

Description

The Adult data set contains the questionnaire data of the “Adult” database (originally called the “Census Income” Database) formatted as a data.frame prepared for use with arules. The Adult_transactions data set contains the data already coerced to transactions.

Usage

data("Adult")
data("Adult_transactions")

Format

Adult contains a data.frame with 48842 observations on the following 14 variables:

age
a factor with levels middle-aged, old, senior, young
workclass
a factor with levels Federal-gov, Local-gov, Never-worked, Private, Self-emp-inc, Self-emp-not-inc, State-gov, Without-pay
education
a factor with levels 10th, 11th, 12th, 1st-4th, 5th-6th, 7th-8th, 9th, Assoc-acdm, Assoc-voc, Bachelors, Doctorate, HS-grad, Masters, Preschool, Prof-school, Some-college
education-num
a factor with levels 1, 10, 11, 12, 13, 14, 15, 16, 2, 3, 4, 5, 6, 7, 8, 9
marital-status
a factor with levels Divorced, Married-AF-spouse, Married-civ-spouse, Married-spouse-absent, Never-married, Separated, Widowed
occupation
a factor with levels Adm-clerical, Armed-Forces, Craft-repair, Exec-managerial, Farming-fishing, Handlers-cleaners, Machine-op-inspct, Other-service, Priv-house-serv, Prof-specialty, Protective-serv, Sales, Tech-support, Transport-moving
relationship
a factor with levels Husband, Not-in-family, Other-relative, Own-child, Unmarried, Wife
race
a factor with levels Amer-Indian-Eskimo, Asian-Pac-Islander, Black, Other, White
sex
a factor with levels Female, Male
capital-gain
a factor with levels high, medium, none, small
capital-loss
a factor with levels medium, none, small
hours-per-week
a factor with levels full-time, half-time, overtime, too-many
native-country
a factor with levels Cambodia, Canada, China, Columbia, Cuba, Dominican-Republic, Ecuador, El-Salvador, England, France, Germany, Greece, Guatemala, Haiti, Holand-Netherlands, Honduras, Hong, Hungary, India, Iran, Ireland, Italy, Jamaica, Japan, Laos, Mexico, Nicaragua, Outlying-US(Guam-USVI-etc), Peru, Philippines, Poland, Portugal, Puerto-Rico, Scotland, South, Taiwan, Thailand, Trinadad&Tobago, United-States, Vietnam, Yugoslavia
salary
a factor with levels small, large

Details

The “Adult” database was extracted from the census bureau database found at http://www.census.gov/ftp/pub/DES/www/welcome.html in 1994 by Ronny Kohavi and Barry Becker, Data Mining and Visualization, Silicon Graphics. It was originally used to predict whether income exceeds $50K/yr based on census data.

To prepare the data set for association mining, we removed the continuous attribute fnlwgt (final weight) and added the attribute salary with levels ‘small’ and ‘large’ ($>$$50K/yr). The original data contained 5 more continuous attributes (age, education-num, capital-gain, capital-loss and hours-per-week) which we coded using discrete values.

Source

http://www.ics.uci.edu/~mlearn/MLRepository.html

References

Blake, C.L. & Merz, C.J. (1998). UCI Repository of Machine Learning Databases. Irvine, CA: University of California, Department of Information and Computer Science.

The data set was first cited in Kohavi, R. (1996). Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining.

Examples

data("Adult")
dim(Adult)
Adult[1:2, 1:4]

data("Adult_transactions")
Adult_transactions

[Package arules version 0.1-0 Index]