get.ncbi {seqinr} | R Documentation |
Try to connect to ncbi ftp site to get a list of complete bacterial genomes.
get.ncbi(repository = "ftp://ftp.ncbi.nih.gov/genomes/Bacteria/")
repository |
Where to look for data. The default value is the location of the complete bacterial genome sequences at ncbi ftp repository. |
Returns a data frame which contains the following columns:
species |
The species name as given by the corresponding folder name in the repository (e.g. Yersinia_pestis_KIM). |
accession |
The accession number as given by the common prefix of file names in the repository (e.g. NC_004088). |
size.bp |
The size of the sequence in bp (e.g. 4600755). |
type |
A factor with two levels (plasmid or chromosome) temptatively deduced from the description of the sequence. |
This function is highly dependant on ncbi ftp site conventions for which we have no control. The ftp connection apparently does not work when there is a proxy, this problem is circumvented here in a rather crude way.
J.R. Lobry
To have an overview of the seqinR's functionnality, please consult this vignette:
Charif, D., Lobry, J.R. (2005) SeqinR: a contributed package to the R project for statistical
computing devoted to biological sequences retrieval and analysis. Springer Verlag, Biological and Medical Physics/Biomedical Series, in preparation.
## Not run: bacteria <- get.ncbi() ## Not run: summary(bacteria)