gendata {BPHO} | R Documentation |
Functions for generating data sets
Description
gen_hmm
generates sequences using hidden Markov models. gen_bin_ho
generates general discrete data using logistic models, with high-order interactions considered; the response is binary. text_to_3number
converts an English text file into sequence of 1 (special symbols such as space, symbol),2 (vowl),3 (consonant).
text_to_number
converts an English text into sequence of 1 - 27, 1-26 for letter a-z, and 27 for all other symbols.
Usage
gen_hmm(n,p,no_h,no_o,prob_h_stay, prob_o_stay)
gen_bin_ho(n,p,order,alpha,sigmas,nos_features,beta0)
text_to_number(p,file)
text_to_3number(p,file)
gen_X(n,p,K)
Arguments
n |
number of cases. |
p |
number of features, or length of sequence. |
K |
number of possibilities for each feature. |
no_h |
number of states of hidden Markov chain. |
no_o |
number of states of output in hidden Markov model. |
prob_h_stay |
In simulating the hidden Markov chain, a chain will stay in its previous state with probability prob_h_stay , and move to other states with some minor probabilities adding up to 1-prob_h_stay . |
prob_o_stay |
In simulating the output state of hidden Markov model, the "output" is equal to ("hidden state" mod no_o )+1 with probability prob_o_stay and equally likely other states. |
order |
the order of interactions considered in simulating data from general classification models. |
alpha |
alpha=2 indicates that Gaussian distributions are used to generate the ``beta"s and alpha=1 indicates that Cauchy distributions are used. |
sigmas |
hyperparameters in generating "beta"s, a vector of length order . |
nos_features |
number of states for each feature, i.e., the number of possibilities for each feature. A vector of length p . |
beta0 |
intercept of linear function in generating classification data. |
file |
name of the file containing text file, a character string. |
Value
X |
values of predictors, a matrix. Each row is a case. For sequence, the data for each case (a row) is placed in the reverse order of time. For example, sequence "x1,x2,x3" is represented with a row of X : x3,x2,x1. The values of predictor X are coded by 1,2,3,...,nos_features . The function gen_X generates only this matrix. |
y |
values of the response, a vector, coded by 1,2,... |
betas |
a matrix of two columns saving the values of ``betas" used in generating classification data. The first column is the absolute identity of this beta, and the 2nd column is the value. The total number of ``betas" is saved in no_betas . |
See Also
comp_train_pred
Examples
data_hmm <- gen_hmm(100,10,8,2,0.8,0.8)
data_bin_ho <- gen_bin_ho(100,3,2,1,c(5,2),c(3,3,3),0)
X <- gen_X(100,5,3)
[Package
BPHO version 1.2-5
Index]