predict_bayes {predbayescor}R Documentation

Classification rule based on Bayesian naive Bayes models with feature selection bias corrected

Description

predict_bayes predicts the binary response based on high dimemsional binary features modeled by Bayesian naive Bayes models. It also accepts real values but they will be converted into binary by thresholding at the medians estimated from the data. A smaller number of features can be selected based on the correlations with the response. The bias due to the selection procedure can be corrected. cv.bayes is the short-cut function for cross-validation with predict_bayes.

Usage


predict_bayes(
              test,train,is.binary.features=FALSE,k,
              subset.sel=1:nrow(train),
              theta0=0,no.theta=20,
              alpha.shape=0.5,alpha.rate=5,no.alpha=5,
              correct=TRUE,no.theta.adj=20)

cv.bayes(
          data,is.binary.features=FALSE,no.folds=10,k,
          theta0=0,no.theta=20,
          alpha.shape=0.5,alpha.rate=5,no.alpha=5,
          correct=TRUE,no.theta.adj=20)

Arguments

test a test data, a matrix, i.e. the data for which we want to predict the responses. The row stands for the cases. The first column is the binary response, which could be NA if they are missing.
train a training data, of the same format as test
data a data used in cross-validation, of the same format as test
no.folds the number of blocks the data is divided into in cross-validation
is.binary.features the indicator whether the features are binary
k the number of features retained
subset.sel the indice of training cases used to select features
theta0 the prior of "theta" is uniform over (theta0,1-theta0)
no.theta the parameter in Simpson's rule used to evaluate the integration w.r.t. "theta". The integrant is evaluated at 2*(no.theta)+1 points.
alpha.shape the shape parameter of the inverse Gamma, which is the prior distribution of "alpha"
alpha.rate the rate parameter of the inverse Gamma, as above
no.alpha the number of "alpha"'s used in mid-point rule, which is used to approximate the integral with respect to "alpha".
correct the indicator whether the correction method shall be applied
no.theta.adj a parameter of Simpson's rule, which is used to evaluate the integration with respect to "theta" in calculating the adjustment factor

Value

prediction a matrix showing the detailed prediction result: the 1st column being the true responses, the 2nd being the predicted responses, the 3rd being the predictive probabilities of class 1 and the 4th being the indicator whether wrong prediction is made.
amlp the average minus log probabilities
error.rate the ratio of wrong prediction
mse the average square error of the predictive probabilities
summary.pred tabular display of the predictive probabilities and the actual fraction of class 1.
alpha.prior.adj.post a matrix showing the detailed information about the "alpha"'s, the 1st column being the values of "alpha"'s, the 2nd being the adjustment factor, i.e. probability that feature is discarded by the cutoff used in the feature selection, the 3rd being the log of the 2nd column times the numbers of discarded features, the 4th being the posterior probabilities
features.selected The features selected using correlation criterion

References

http://math.usask.ca/~longhai/doc/naivebayes/naivebayes.abstract.html

See Also

gendata.bayes

Examples


#generate a dataset
d <- gendata.bayes(100,100,500,500,1000,400)

#do prediction with correction applied
pred.d.cor <- predict_bayes(d$test,d$train,TRUE,10,,0,20,0.5,5,20,TRUE,40)

#do prediction without correction applied
pred.d.uncor <- predict_bayes(d$test,d$train,TRUE,10,,0,20,0.5,5,20,FALSE,40)

#do 5-fold cross-validation on the training data with correction applied
cv.dtr.cor <- cv.bayes(d$train,TRUE,5,10,0,20,0.5,5,20,TRUE,40)


[Package predbayescor version 1.1-4 Index]