gendata {BPHO}R Documentation

Functions for generating data sets

Description

gen_hmm generates sequences using hidden Markov models. gen_bin_ho generates general discrete data using logistic models, with high-order interactions considered; the response is binary. text_to_3number converts an English text file into sequence of 1 (special symbols such as space, symbol),2 (vowl),3 (consonant). text_to_number converts an English text into sequence of 1 - 27, 1-26 for letter a-z, and 27 for all other symbols.

Usage

gen_hmm(n,p,no_h,no_o,prob_h_stay, prob_o_stay)
gen_bin_ho(n,p,order,alpha,sigmas,nos_features,beta0)
text_to_number(p,file)
text_to_3number(p,file)
gen_X(n,p,K)

Arguments

n number of cases.
p number of features, or length of sequence.
K number of possibilities for each feature.
no_h number of states of hidden Markov chain.
no_o number of states of output in hidden Markov model.
prob_h_stay In simulating the hidden Markov chain, a chain will stay in its previous state with probability prob_h_stay, and move to other states with some minor probabilities adding up to 1-prob_h_stay.
prob_o_stay In simulating the output state of hidden Markov model, the "output" is equal to ("hidden state" mod no_o)+1 with probability prob_o_stay and equally likely other states.
order the order of interactions considered in simulating data from general classification models.
alpha alpha=2 indicates that Gaussian distributions are used to generate the ``beta"s and alpha=1 indicates that Cauchy distributions are used.
sigmas hyperparameters in generating "beta"s, a vector of length order.
nos_features number of states for each feature, i.e., the number of possibilities for each feature. A vector of length p.
beta0 intercept of linear function in generating classification data.
file name of the file containing text file, a character string.

Value

X values of predictors, a matrix. Each row is a case. For sequence, the data for each case (a row) is placed in the reverse order of time. For example, sequence "x1,x2,x3" is represented with a row of X: x3,x2,x1. The values of predictor X are coded by 1,2,3,...,nos_features. The function gen_X generates only this matrix.
y values of the response, a vector, coded by 1,2,...
betas a matrix of two columns saving the values of ``betas" used in generating classification data. The first column is the absolute identity of this beta, and the 2nd column is the value. The total number of ``betas" is saved in no_betas.

See Also

comp_train_pred

Examples

data_hmm <- gen_hmm(100,10,8,2,0.8,0.8)
data_bin_ho <- gen_bin_ho(100,3,2,1,c(5,2),c(3,3,3),0)
X <- gen_X(100,5,3)

[Package BPHO version 1.2-5 Index]