expData {mseq} | R Documentation |
The original data file only contains one sequence. This function will expand for each retained position its surrounding sequence. The output data frame can be directly used by Poisson linear model and MART.
expData(oriData, llen, rlen)
oriData |
the original data frame read directly from the file |
llen |
the number of nucleotides before the first nucleotide of a read, which we consider as surrounding sequence |
rlen |
the number of nucleotides in and after the first nucleotide of a read, which we consider as surrounding sequence |
Note that we will not check the format of oriData
here. Please generate the original data file carefully.
Please refer to Readme_format.txt
in folder data_top100
for the details of the format.
a data frame including the counts and surrounding sequences, which can be directly used by Poisson linear model and MART.
The number of rows are the number of 0
s in oriData$tag
. The number of columns are llen + rlen + 2
, with names
index, count, pMllen, ..., pM1, p0, p1, ..., p(rlen-1)
, where p
means position, M
means minus.
# read and expand the data data(g1_part) # for real data, please use read.csv, like g1 <- read.csv("g1.csv") data <- expData(g1_part, 2, 3) # In real datasets, the surrounding sequences should be set longer.