train_pred_gau {gausspred}R Documentation

Training with Markov chain sampling, predicting for test cases, and evaluating performance with cross-validation

Description

training_gau trains the Gaussian classification models with Markov chain Monte Carlo.

predict_gau uses the posterior samples returned by training_gau to predict the response values of test cases.

crossvalid_gau uses cross-validation to evaluate the prediction performance, which can be used to evaluate the goodness of the prediction methods, also used to tune parameters used in predictions, such as of prior distributions and Markov chain sampling.

Usage

training_gau (
    ## arguments specifying data sets
    G,features,response,
    ## arguments specifying priors
    prior_y=rep(1,G),
    p_tau_nu =  c(alpha=2,w=0.5),
    p_tau_mu =  c(alpha=2,w=0.5),
    p_tau_x  =  c(alpha=2,w=0.5),
    ## arguments specifying Gibbs sampling
    nos_super = 100, nos_trans = 5, ini_taus=rep(1,3),
    ## arguments specifying correcting for bias 
    cor=0, p=ncol(features),cutoff=1, min_qf=exp(-10),
    nos_lambda=1000, stepsize_log_tau=-2, no_steps=10 
    )
predict_gau (features, out_tr, pred_range, thin)
crossvalid_gau ( 
    ## arguments specifying data sets and crossvalidation
    no_fold, k, G, features,response, lmd_trf,
    ## arguments specifying priors
    prior_y=rep(1,G),
    p_tau_nu =  c(alpha=2,w=0.5),
    p_tau_mu =  c(alpha=2,w=0.5),
    p_tau_x  =  c(alpha=2,w=0.5),
    ## arguments specifying Gibbs sampling
    nos_super = 100, nos_trans = 5, ini_taus=rep(1,3),
    ## arguments specifying correcting for bias 
    cor=0, min_qf=exp(-10),nos_lambda=1000, 
    stepsize_log_tau=-2, no_steps=10 ,
    ## arguments specifying prediction
    pred_range, thin =1
    )

Arguments

Arguments of training_gau and crossvalid_gau:
G the number of groups, ie, number of possibilities of response
features the features, with the rows for the cases.
response the response values.
prior_y a vector of length 'G', specifying the Dirichlet prior distribution for probabilities of the response.
p_tau_nu, p_tau_mu, p_tau_x vectors of 2 numbers, specifying the Gamma distribution as prior for the inverse of the variance of the distribution of nu, μ, and features respectively; the first number is shape, the second is rate.
nos_super, nos_trans nos_super of super Markov chain transitions are run, with nos_trans Markov chain iterations for each. Only the last state of each super transition is saved. This is used to avoid saving Markov chain state for each iteration.
ini_taus a vector of length 3, specifying initial values for tau^nu, tau^μ, tau^x.
cor taking value 0 or 1, indicating whether bias-correction is to be applied.
p the number of total features before selection. This number needs to be supplied by users other than inferred from other arguments.
cutoff the cutoff of F-statistic used to select features. This number needs to be supplied by users other than inferred from other arguments.
min_qf the minimum value of "f" used to cut the infinite summation in calculating correction factor. Details see the paper.
nos_lambda the number of random numbers for Lambda in approximating the correction factor. Details see the paper.
stepsize_log_tau the stepsize of Gaussian proposal used in sampling log tau_mu when bias-correction is applied.
no_steps iterations of Metropolis sampling for log tau_mu.
Arguments only of predict_gau:
out_tr output of Markov chain sampling returned by training_gau
pred_range the range of super Markov chain transitions used to predict the response values of test cases.
thin only 1 sample for every thin samples are used in predicting, chosen evenly.
Arguments only of crossvalid_gau:
no_fold the number of subsets of the data in making cross-validation assessment.
k the number of features selected.
lmd_trf the value of lambda used to estimate covariance matrix, which is used to transform data with Choleski decomposition. The larger this number is the estimated covariance matrix is closer to diagonal.

Value

The function training_gau returns the following values:

mu an array of three dimensions, storing Markov chain samples of μ, with the first dimension for different features, the 2nd dimension for different groups, the third dimension for different Markov chain super transitions.
nu a matrix, storing Markov chain samples of nu, with rows for features, the columns for Markov chain iterations.
tau_x a vector, storing Markov chain samples of tau^x.
tau_mu a vector, storing Markov chain samples of tau^μ.
tau_nu a vector, storing Markov chain samples of tau^nu.
freq_y the posterior mean of the probabilities for response.


Both predict_gau and crossvalid_gau return a matrix of the predictive probabilities, with rows for cases, columns for different groups (different values of response).

Examples


##### this is a full demonstration of using this package ######
###############################################################

## parameter setting
n <- 200+400
p <- 400
G <- 6
p_tau_x  <- c(4,1)
p_tau_mu <- c(1.5,0.01)
p_tau_nu <- c(1.5,0.01)
tau_mu <- 100 

## generate a data set
data <- gen_bayesgau (n,p,G,tau_nu=100,tau_mu,p_tau_x )

## specifying cases as training set
ix_tr <- 1:200 

## ordering features by F-statistic
i_sel <- order_features(data$X[ix_tr,],data$y[ix_tr])
vars <- i_sel$vars[1:10]
cutoff <- i_sel$fstat[10]

## training model with bias-corrected method
out_tr_cor <- training_gau(
    G = G, data$X[ix_tr,vars,drop=FALSE], data$y[ix_tr], 
    prior_y = rep(1,G),
    p_tau_nu, p_tau_mu, p_tau_x ,
    nos_super = 400, nos_trans = 1, ini_taus=rep(1,3),
    ## information on correcting for bias 
    cor=1, p=p,cutoff=cutoff, min_qf=exp(-10),nos_lambda=100, 
    stepsize_log_tau=0.5, no_steps=5 
    )

## make prediction
out_pred_cor <- predict_gau( 
    data$X[-(ix_tr),vars,drop=FALSE], out_tr_cor, 
    pred_range=c(50,400), thin = 1) 

## define 0-1 loss function
Mlosser <- matrix (1,G,G)
diag(Mlosser) <- 0

## randomly generate a loss function
Mloss <- matrix(1,G,G)
Mloss <- matrix(exp(rnorm(G^2,0,2)),G,G)
diag(Mloss) <- 0

## evaluate prediction with test cases 

## calculating average minus log probabilities
amlp_cor <- comp_amlp(out_pred_cor,data$y[-ix_tr])

## calculating error rate
er_cor <- comp_loss(out_pred_cor,data$y[-ix_tr],Mlosser)

## calculating average loss from the randomly generated loss function
l_cor <- comp_loss(out_pred_cor,data$y[-ix_tr],Mloss)


[Package gausspred version 1.0-0 Index]