s2n {seqinr} | R Documentation |
By default, if no levels
arguments is provided, this function will
just code your DNA sequence in integer values following the lexical
order (a > c > g > t)
, that is 0 for "a", 1 for "c", 2 for "g", 3 for
"t" and NA for ambiguous bases.
s2n(seq, levels, base4 = TRUE)
seq |
a vector of chars |
levels |
allowed char values, by default a, c, g and t |
base4 |
if TRUE the numerical encoding will start at O, if FALSE at 1 |
... |
further arguments to factor |
a vector of integers
The idea of starting numbering at 0 by default is that it enforces a kind of isomorphism between the paste operator on DNA chars and the + operator on integer coding for DNA chars. By this way, you can work either in the char set, either in the integer set, depending on what is more convenient for your purpose, and then switch from one set to the other one as you like.
J.R. Lobry
citation("seqinr")
#example of default behaviour urndna <- c("a","c","g","t") seq <- sample( urndna, 100, replace = TRUE ) ; seq s2n(seq) #How to deal with RNA urnrna <- c("a","c","g","t") seq <- sample( urnrna, 100, replace = TRUE ) ; seq s2n(seq) #what's happen with unknown characters urnmess <- c(urndna,"n") seq <- sample( urnmess, 100, replace = TRUE ) ; seq s2n(seq) #How to change the encoding for unknown characters tmp <- s2n(seq) ; tmp[is.na(tmp)] <- -1; tmp