uco {seqinr} | R Documentation |
uco
calculates some codon usage indices: the codon counts eff
, the relative frequencies freq
or the Relative Synonymous Codon Usage rscu
.
uco(seq, frame = 0, index = c("eff", "freq", "rscu"), as.data.frame = FALSE)
seq |
a coding sequence as a vector of chars |
frame |
an integer (0, 1, 2) giving the frame of the coding sequence |
index |
codon usage index choice, partial matching is allowed.
eff for codon counts,
freq for codon relative frequencies,
and rscu the RSCU index |
as.data.frame |
logical. If TRUE : all indices are returned into a data frame. |
Codons with ambiguous bases are ignored.
RSCU is a simple measure of non-uniform usage of synonymous codons in a coding sequence
(Sharp et al. 1986).
RSCU values are the number of times a particular codon is observed, relative to the number
of times that the codon would be observed for a uniform synonymous codon usage (i.e. all the
codons for a given amino-acid have the same probability).
In the absence of any codon usage bias, the RSCU values would be 1.00 (this is the case
for sequence cds
in the exemple thereafter).Ê A codon that is used
less frequently than expected will have an RSCU value of less than 1.00 and vice versa for a codon
that is used more frequently than expected.
Ê
Do not use correspondence analysis on RSCU tables as this is a source of artifacts (Perriere and Thioulouse 2002). Within-aminoacid correspondence analysis is a simple way to study synonymous codon usage (Charif et al. 2005).
If as.data.frame
is TRUE uco
returns a data frame with five columns:
aa |
a vector containing the name of amino-acid |
codon |
a vector containing the corresponding codon |
eff |
a numeric vector of codon counts |
freq |
a numeric vector of codon relative frequencies |
rscu |
a numeric vector of RSCU index |
eff |
a table of codon counts |
freq |
a table of codon relative frequencies |
rscu |
a vector of relative synonymous codon usage values |
D. Charif, J.R. Lobry
citation("seqinr")
Sharp, P.M., Tuohy, T.M.F., Mosurski, K.R. (1986) Codon usage in yeast: cluster
analysis clearly differentiates highly and lowly expressed genes.
Nucl. Acids. Res., 14:5125-5143.
Perriere, G., Thioulouse, J. (2002) Use and misuse of correspondence analysis in
codon usage studies. Nucl. Acids. Res., 30:4548-4555.
Charif, D., Thioulouse, J., Lobry, J.R., Perriere, G. (2005) Online Synonymous Codon Usage Analyses with the ade4 and seqinR packages. Bioinformatics, 21:545-547. http://pbil.univ-lyon1.fr/members/lobry/repro/bioinfo04/.
## Show all possible codons: words() ## Make a coding sequence from this: (cds <- s2c(paste(words(), collapse = ""))) ## Get codon counts: uco(cds, index = "eff") ## Get codon relative frequencies: uco(cds, index = "freq") ## Get RSCU values: uco(cds, index = "rscu") ## Show what's happen with ambiguous bases: uco(s2c("aaannnttt")) ## Use a real coding sequence: rcds <- read.fasta(File = system.file("sequences/malM.fasta", package = "seqinr"))[[1]] uco( rcds, index = "freq") uco( rcds, index = "eff") uco( rcds, index = "rscu") uco( rcds, as.data.frame = TRUE)