read.structure {adegenet} | R Documentation |
The function read.structure
reads STRUCTURE data files (.str
ou .stru) and convert them into a genind object. By
default, this function is interactive and asks a few questions about
data content. This can be disabled (for optional questions) by
turning the 'ask' argument to FALSE. However, one has to know the
number of genotypes, of markers and if genotypes are coded on a
single or on two rows before importing data.
read.structure(file, n.ind=NULL, n.loc=NULL, onerowperind=NULL, col.lab=NULL, col.pop=NULL, col.others=NULL, row.marknames=NULL, NA.char="-9", pop=NULL, missing=NA, ask=TRUE, quiet=FALSE)
file |
a character string giving the path to the file to convert, with the appropriate extension. |
n.ind |
an integer giving the number of genotypes (or 'individuals') in the dataset |
n.loc |
an integer giving the number of markers in the dataset |
onerowperind |
a STRUCTURE coding option: are genotypes coded on a single row (TRUE), or on two rows (FALSE, default) |
col.lab |
an integer giving the index of the column containing labels of genotypes. '0' if absent. |
col.pop |
an integer giving the index of the column containing population to which genotypes belong. '0' if absent. |
col.others |
an vector of integers giving the indexes of the columns containing other informations to be read. Will be available in @other of the created object. |
row.marknames |
an integer giving the index of the row containing the names of the markers. '0' if absent. |
NA.char |
the character string coding missing data. "-9" by default. Note that in any case, series of zero (like "000") are interpreted as NA too. |
pop |
an optional factor giving the population of each individual. |
ask |
a logical specifying if the function should ask for optional informations about the dataset (TRUE, default), or try to be as quiet as possible (FALSE). |
missing |
can be NA, 0 or "mean". See details section. |
quiet |
logical stating whether a conversion message must be printed (TRUE,default) or not (FALSE). |
There are 3 treatments for missing values:
- NA: kept as NA.
- 0: allelic frequencies are set to 0 on all alleles of the concerned
locus. Recommended for a PCA on compositionnal data.
- "mean": missing values are replaced by the mean frequency of the
corresponding allele, computed on the whole set of
individuals. Recommended for a centred PCA.
an object of the class genind
Thibaut Jombart jombart@biomserv.univ-lyon1.fr
Pritchard, J.; Stephens, M. & Donnelly, P. (2000) Inference of population structure using multilocus genotype data. Genetics, 155: 945-959
import2genind
, df2genind
,
read.fstat
, read.genetix
, read.genepop
obj <- read.structure(system.file("files/nancycats.str",package="adegenet"), n.ind=237, n.loc=9, col.lab=1, col.pop=2, ask=FALSE) obj