read.structure {adegenet}R Documentation

Reading data from STRUCTURE

Description

The function read.structure reads STRUCTURE data files (.str ou .stru) and convert them into a genind object. By default, this function is interactive and asks a few questions about data content. This can be disabled (for optional questions) by turning the 'ask' argument to FALSE. However, one has to know the number of genotypes, of markers and if genotypes are coded on a single or on two rows before importing data.

Usage

read.structure(file, n.ind=NULL, n.loc=NULL,  onerowperind=NULL, col.lab=NULL, col.pop=NULL, col.others=NULL, row.marknames=NULL, NA.char="-9", pop=NULL, missing=NA, ask=TRUE, quiet=FALSE)

Arguments

file a character string giving the path to the file to convert, with the appropriate extension.
n.ind an integer giving the number of genotypes (or 'individuals') in the dataset
n.loc an integer giving the number of markers in the dataset
onerowperind a STRUCTURE coding option: are genotypes coded on a single row (TRUE), or on two rows (FALSE, default)
col.lab an integer giving the index of the column containing labels of genotypes. '0' if absent.
col.pop an integer giving the index of the column containing population to which genotypes belong. '0' if absent.
col.others an vector of integers giving the indexes of the columns containing other informations to be read. Will be available in @other of the created object.
row.marknames an integer giving the index of the row containing the names of the markers. '0' if absent.
NA.char the character string coding missing data. "-9" by default. Note that in any case, series of zero (like "000") are interpreted as NA too.
pop an optional factor giving the population of each individual.
ask a logical specifying if the function should ask for optional informations about the dataset (TRUE, default), or try to be as quiet as possible (FALSE).
missing can be NA, 0 or "mean". See details section.
quiet logical stating whether a conversion message must be printed (TRUE,default) or not (FALSE).

Details

There are 3 treatments for missing values:
- NA: kept as NA.

- 0: allelic frequencies are set to 0 on all alleles of the concerned locus. Recommended for a PCA on compositionnal data.

- "mean": missing values are replaced by the mean frequency of the corresponding allele, computed on the whole set of individuals. Recommended for a centred PCA.

Value

an object of the class genind

Author(s)

Thibaut Jombart jombart@biomserv.univ-lyon1.fr

References

Pritchard, J.; Stephens, M. & Donnelly, P. (2000) Inference of population structure using multilocus genotype data. Genetics, 155: 945-959

See Also

import2genind, df2genind, read.fstat, read.genetix, read.genepop

Examples

obj <- read.structure(system.file("files/nancycats.str",package="adegenet"),
  n.ind=237, n.loc=9, col.lab=1, col.pop=2, ask=FALSE)

obj

[Package adegenet version 1.2-2 Index]