wpbc {mboost} | R Documentation |
Each record represents follow-up data for one breast cancer case. These are consecutive patients seen by Dr. Wolberg since 1984, and include only those cases exhibiting invasive breast cancer and no evidence of distant metastases at the time of diagnosis.
data("wpbc")
A data frame with 198 observations on the following 34 variables.
status
N
(nonrecur) and
R
(recur)time
status == "R"
) or
disease-free time (for status == "N"
). mean_radius
mean_texture
mean_perimeter
mean_area
mean_smoothness
mean_compactness
mean_concavity
mean_concavepoints
mean_symmetry
mean_fractaldim
SE_radius
SE_texture
SE_perimeter
SE_area
SE_smoothness
SE_compactness
SE_concavity
SE_concavepoints
SE_symmetry
SE_fractaldim
worst_radius
worst_texture
worst_perimeter
worst_area
worst_smoothness
worst_compactness
worst_concavity
worst_concavepoints
worst_symmetry
worst_fractaldim
tsize
pnodes
The first 30 features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.
There are two possible learning problems: predicting status
or predicting
the time to recur.
1) Predicting field 2, outcome: R = recurrent, N = non-recurrent - Dataset should first be filtered to reflect a particular endpoint; e.g., recurrences before 24 months = positive, non-recurrence beyond 24 months = negative. - 86.3 previous version of this data.
2) Predicting Time To Recur (field 3 in recurrent records) - Estimated mean error 13.9 months using Recurrence Surface Approximation.
The data are originally available from the UCI machine learning repository, see http://www.ics.uci.edu/~mlearn/databases/breast-cancer-wisconsin/.
W. Nick Street, Olvi L. Mangasarian and William H. Wolberg (1995). An inductive learning approach to prognostic prediction. In A. Prieditis and S. Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 522–530, San Francisco, Morgan Kaufmann.
Peter Buhlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477–505.
data("wpbc", package = "mboost") ### fit logistic regression model with 100 boosting iterations coef(glmboost(status ~ ., data = wpbc[,colnames(wpbc) != "time"], family = Binomial()))