zprostate {bestglm} | R Documentation |
Data with 8 inputs and one output used to illustrate the prediction problem and regression in the textbook of Hastie, Tibshirani and Freedman (2009).
data(zprostate)
A data frame with 97 observations, 9 inputs and 1 output. All input variables have been standardized.
lcavol
lweight
age
lbph
svi
lcp
gleason
pgg45
lpsa
train
A study of 97 men with prostate cancer examined the correlation between PSA (prostate specific antigen) and a number of clinical measurements: lcavol, lweight, lbph, svi, lcp, gleason, pgg45
http://www-stat-class.stanford.edu/~tibs/ElemStatLearn/
Hastie, Tibshirani & Friedman. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd Ed. Springer.
#Example 1. Prostate data. Table 3.3 HTF. data(zprostate) #full dataset trainQ<-zprostate[,10] train <-zprostate[trainQ,-10] test <-zprostate[!trainQ,-10] ans<-lm(lpsa~., data=train) sig<-summary(ans)$sigma yHat<-predict(ans, newdata=test) yTest<-zprostate$lpsa[!trainQ] TE<-mean((yTest-yHat)^2) #subset ansSub<-bestglm(train, IC="BICq")$BestModel sigSub<-summary(ansSub)$sigma yHatSub<-predict(ansSub, newdata=test) TESub<-mean((yTest-yHatSub)^2) m<-matrix(c(TE,sig,TESub,sigSub), ncol=2) dimnames(m)<-list(c("TestErr","Sd"),c("LS","Best")) m