spambase {nutshell} | R Documentation |
The Spambase data set was created by Mark Hopkins, Erik Reeber, George Forman, and Jaap Suermondt at Hewlett-Packard Labs. It includes 4601 observations corresponding to email messages, 1813 of which are spam. From the original email messages, 58 different attributes were computed.
data(spambase)
A data frame with 4601 observations on the following 58 variables.
word_freq_make
word_freq_address
word_freq_all
word_freq_3d
word_freq_our
word_freq_over
word_freq_remove
word_freq_internet
word_freq_order
word_freq_mail
word_freq_receive
word_freq_will
word_freq_people
word_freq_report
word_freq_addresses
word_freq_free
word_freq_business
word_freq_email
word_freq_you
word_freq_credit
word_freq_your
word_freq_font
word_freq_000
word_freq_money
word_freq_hp
word_freq_hpl
word_freq_george
word_freq_650
word_freq_lab
word_freq_labs
word_freq_telnet
word_freq_857
word_freq_data
word_freq_415
word_freq_85
word_freq_technology
word_freq_1999
word_freq_parts
word_freq_pm
word_freq_direct
word_freq_cs
word_freq_meeting
word_freq_original
word_freq_project
word_freq_re
word_freq_edu
word_freq_table
word_freq_conference
char_freq_semicolon
char_freq_left_paren
char_freq_left_bracket
char_freq_exclamation
char_freq_dollar
char_freq_pound
capital_run_length_average
capital_run_length_longest
capital_run_length_total
is_spam
0
1
This data is used as an example in the book "R in a Nutshell," from O'Reilly Media.
This data set is from the UCI Machine Learning Repository. You can find more information about this data set, including the ciation policy, from http://archive.ics.uci.edu/ml/datasets/Spambase
data(spambase) table(spambase$is_spam) # fit a linear disciminant analysis model to the data library(MASS) spam.lda <- qda(formula=is_spam~., data=spambase)