read.alignment {seqinr} | R Documentation |
Read aligned sequence files in mase, clustal, phylip, fasta or msf format
Description
Read a file in mase
, clustal
, phylip
, fasta
or msf
format.
These formats are used to store nucleotide or protein multiple alignments.
Usage
read.alignment(File, format)
Arguments
File |
the name of the file which the aligned sequences are to be read from.
If it does not contain an absolute or relative path, the file name is relative
to the current working directory, getwd . |
format |
a character string specifying the format of the file : mase ,
clustal , phylip , fasta or msf |
Details
"mase"The mase format is used to store nucleotide or protein
multiple alignments. The beginning of the file must contain a header
containing at least one line (but the content of this header may be
empty). The header lines must begin by ;;
. The body of the
file has the following structure: First, each entry must begin by
one (or more) commentary line. Commentary lines begin by the character
;
. Again, this commentary line may be empty. After the
commentaries, the name of the sequence is written on a separate
line. At last, the sequence itself is written on the following lines.
"clustal"The CLUSTAL format (*.aln) is the format of the
ClustalW multialignment tool output. It can be described as follows.
The word CLUSTAL is on the first line of the file. The alignment
is displayed in blocks of a fixed length, each line in the block
corresponding to one sequence. Each line of each block starts with
the sequence name (maximum of 10 characters), followed by at least
one space character. The sequence is then displayed in upper or
lower cases, '-' denotes gaps. The residue number may be displayed
at the end of the first line of each block.
"msf"MSF is the multiple sequence alignment format of the
GCG sequence analysis package. It begins with the line (all
uppercase) !!NA_MULTIPLE_ALIGNMENT 1.0 for nucleic acid sequences
or !!AA_MULTIPLE_ALIGNMENT 1.0 for amino acid sequences. Do
not edit or delete the file type if its present.(optional).
A description line which contains informative text describing what
is in the file. You can add this information to the top of the MSF
file using a text editor.(optional) A dividing line which contains
the number of bases or residues in the sequence, when the file was
created, and importantly, two dots (..) which act as a divider
between the descriptive information and the following sequence
information.(required) msf files contain some other information:
the Name/Weight, a Separating Line which must include two slashes
(//) to divide the name/weight information from the sequence
alignment.(required) and the multiple sequence alignment.
"phylip"PHYLIP is a tree construction program. The format
is as follows: the number of sequences and their length (in characters)
is on the first line of the file. The alignment is displayed in an
interleaved or sequential format. The sequence names are limited
to 10 characters and may contain blanks.
"fasta"Sequence in fasta format begins with a single-line
description (distinguished bby a greater-than (>) symbol), followed
by sequence data on the next line.
Value
It returns an object of class alignment
which is a list with the following components:
nb |
the number of aligned sequences |
nam |
a vector of strings containing the names of the aligned sequences |
seq |
a vector of strings containing the aligned sequences |
com |
a vector of strings containing the commentaries for each sequence or NA if there is no comments |
Author(s)
D. Charif, J.R. Lobry
References
citation("seqinr")
Examples
mase <- read.alignment(File = system.file("sequences/test.mase", package = "seqinr"), format = "mase")
clustal <- read.alignment(File = system.file("sequences/test.aln", package = "seqinr"), format="clustal")
phylip <- read.alignment(File = system.file("sequences/test.phylip", package = "seqinr"), format = "phylip")
msf <- read.alignment(File = system.file("sequences/test.msf", package = "seqinr"), format = "msf")
fasta <- read.alignment(File = system.file("sequences/Anouk.fasta", package = "seqinr"), format = "fasta")
[Package
seqinr version 1.0-6
Index]