A new starting point {eqtl} | R Documentation |
A brief introduction to the R/eqtl package, with a walk-through of a typical analysis.
library(qtl)
and library(eqtl)
. You may want to include this in a .Rprofile
file.
help.start
function to start the html version of the R help.
library(help=qtl)
to get a list of the functions in R/qtl.
library(help=eqtl)
to get a list of the functions in R/eqtl.
Here we briefly describe the use of R/eqtl and R/qtl to analyze an experimental cross. R/eqtl is an add-on package to Karl Broman's R/qtl. It requires the 'qtl' package and uses some of its functions. Therefore this tutorial takes in consideration prior knowledge of R/qtl. You must read the R/qtl documentation and tutorial before to perform any analysis with the 'eqtl' add-on.
A difficult first step in the use of most data-analysis software is to import the data in adequate format. This step is perfectly described in R/qtl tutorial. With R/eqtl you should import some extra data in addition to the data needed for R/qtl. We won't discuss about data import at this point. This step is described in the next chapter 'First step'.
We consider the example data seed10
, an experiment on gene expression in Arabidopsis thaliana. Use the data
function to load the data.
data(seed10)
seed10
data have class cross
and riself
. It describes an experiment on a RIL population obtained by single seed descent. The function summary.cross
gives summary information on the data, and checks the data for internal consistency. A lot of utility functions are available in 'qtl' and are widely described in Karl's tutorial.
To project our results on the physical map, we also need to load the physical position of the genetic markers and the genomic physical coordinates of the probes used to estimate expression traits described in seed10
. For information, BSpgmap
and ATH.coord
are simple data frame with specific column names.
data(BSpgmap)
names(BSpgmap)
data(ATH.coord)
names(ATH.coord)
Before running the QTL analysis, intermediate calculations need to be performed. The function calc.genoprob
is used to compute the conditional probabilities at each pseudo-marker. sim.geno
simulates sequences of genotypes from their joint probabilities. See 'qtl' manual for details. These steps have been already performed on seed10 and you may not need to run them again. Here, pseudo-markers have been defined every 0.5 centimorgan ( step=0.5
).
seed10 <- calc.genoprob(seed10, step=0.5, off.end=0, error.prob=0,
map.function='kosambi', stepwidth='fixed')
seed10 <- sim.geno(seed10, step=0.5, off.end=0, error.prob=0,
map.function='kosambi', stepwidth='fixed')
Use the scanone
function to perform an interval mapping.
BaySha.em <- scanone(seed10,method='em',pheno.col=1:nphe(seed10),model='normal')
The microarray probes usually contains data for which we don't want to perform any QTL analysis like the buffers, the controls or some missed probes. The function clean.phe
cleans the seed10
and/or the BaySha.em
data for undesired phenotypes.
seed10.cleaned <- clean.phe(seed10,"Buffer")
seed10.cleaned <- clean.phe(seed10,"Ctrl")
BaySha.em <- clean.phe(BaySha.em,"Buffer")
BaySha.em <- clean.phe(BaySha.em,"Ctrl")
In this example, dropped data comes from probes named "Buffer"
and "Ctrl"
found within CATMA data. Note that one could a priori clean the seed10
data before computing the interval mapping. The scanone object will be directly generated clean.
One of the major problematic step for genome-wide expression QTL analysis, is to read all the LOD curves and sytematically define the QTLs. Because of the amount of results, it is not feasible to read by eyes all the LOD curves. Use define.peak
function to define QTL with drop LOD support interval from the scanone results, here the interval mapping results BaySha.em
.
BaySha.peak <- define.peak(BaySha.em,locdolumn='all')
class(BaySha.peak)
The parameter lodcolumn='all'
specifies to analyse all LOD columns (all the traits) of the scanone object BaySha.em
. lodcolumn='CATrck'
specifies to analyse the scanone LOD column CATrck
only, which is supposed to be the interval mapping result of the trait CATrck
.
We call peak
object, the results of the define.peak
function. The peak
object is used to store the QTL definition. The QTL are defined by several features decribed in the peak
objects attributes. At this step, a QTL is only defined by its LOD score, its location, the subjective quality of the LOD peak. See define.peak
function for details.
attributes(BaySha.peak)
Back to the define.peak
parameters. graph=TRUE
specifies to draw the LOD curves with LOD support interval. The curves showing a QTL detected will be drawn on different charts for each chromosome. Note that, no graphical setup has been defined and therefore all graphs generated will appear one above the others. You should specify the graphical parameter mfrow
of the R function par()
before running define.peak
to draw all charts in the same window. You may not want to set the parameters graph=TRUE
and lodcolumn='all'
at the same time, depending on the amount of traits analyzed.
The following command lines gives an example to define QTL and draw chart for a unique trait CATrck
.
png(filename='CATrck.png',width=800,height=600)
par(mfrow=c(1,5))
define.peak(BaySha.im, lodcolumn='CATrck', graph=TRUE, chr=c(1,5))
par(mfrow=c(1,1))
dev.off();
png()
and dev.off()
are classical R functions which indicates here to print the graph generated as a png file 'CATrck.png'
. By using these functions, you can page set the graph as you wanted. Differently, the define.peak
function parameter's, save.pict=TRUE
, will systematically save all single LOD curves generated for each chromosome as png files. The files generated will be named with the names of the trait and the chromosomes where the QTLs are located. So beware to the amount of data you're analysing before setting the parameters save.pict=TRUE
.
The way to access QTL results within peak
object is quite simple:
BaySha.peak
BaySha.peak$CATrck
BaySha.peak
will give you the define.peak
results ordered by trait and chromosomes, respectively. BaySha.peak$CATrck
will give you the results for the trait 'CATrck'
and so on for other trait names. If no QTL had been detected for a trait, the result will be the value NA
.
To complete the QTL analysis, use the functions calc.adef
, localize.qtl
and classify.qtl
to compute, for each QTL previously detected in peak
object, the additive effect, the estimated physical location and the estimated acting-type in case of eQTL, respectively. All of these functions will add peak features to the peak
object.
BaySha.peak <- localize.qtl(cross=seed10.cleaned, peak=BaySha.peak,
data.gmap=BSpgmap)
BaySha.peak <- calc.adef(cross=seed10.cleaned, scanone=BaySha.em,
peak=BaySha.peak)
BaySha.peak <- classify.qtl(cross=seed10.cleaned, peak=BaySha.peak,
etrait.coord=ATH.coord, data.gmap=BSpgmap)
attributes(BaySha?peak)
For each of these functions you have to specify the peak
object. You also need to specify the related cross
object and scanone
results, the related genetic map physical data BSpgmap
and the expression traits physical data ATH.coord
. Note that, the expression trait physical data (here ATH.coord
) may contain more traits than those studied. Conversely, all traits studied within the peak
, the scanone
or the cross
objects must be described in ATH.coord
.
Use calc.Rsq
function to compute, from a peak
object, the contribution of the individual QTLs to the phenotypic variation. At the same time this function tests and computes the contribution of significant epistatic interactions between QTLs. By default the significant threshold is set to th=0.001
. In case you wanted to take all QTL interactions whatever the significance, you must set th=1
.
BaySha.Rsq <- calc.Rsq(cross=seed10.cleaned,peak=BaySha.peak)
BaySha.Rsq
plot.Rsq(rsq=BaySha.Rsq)
The function peak.2.array
will format all QTL results in a simple array. The column names are the names of the peak features described in peak
object. This array have class peak.array
. Rsq.2.array
add the R square column to the QTL array. Formating the results as a simple array allows to use all basic and complex R functions (statistical, summary, graphical, histograms...) to study the results customly and in the simplest way. This format also allows to write the results in a file (like text or CSV) to save out the data.
BaySha.array <- peak.2.array(BaySha.peak);
BaySha.array <- Rsq.2.array(rsq=BaySha.Rsq,BaySha.array);
'eqtl' provides also useful functions to get an overview of the QTLs results stored in peak.array
:
The summary.peak
function gives a variety of summary information and an overview of peak distribution. Summary graphs are available by setting graph=TRUE
. Like define.peak
, no graphical parameters had been setted and therefore all graphs generated will appear one above the others in the same R graph window. You may define mfrow
before running summary.peak
to draw all charts in the same R window.
Whole QTL summary with graphs:
par(mfrow=c(2,4))
BaySha.summary <- summary.peak(peak.2.array,seed10.cleaned,graph=TRUE)
par(mfrow=c(1,1))
names(BaySha.summary)
BaySha.summary
QTL summary with graphs excluding QTL localized on the chromosome 3 between 5000 and 6000 bp:
par(mfrow=c(2,4))
BaySha.sum_exc <- summary.peak( BaySha.array, seed10.cleaned,
exc=data.frame(inf=5000, sup=6000, chr=3), graph=TRUE)
par(mfrow=c(1,1))
names(BaySha.sum_exc)
BaySha.sum_exc
The function plot.genome
provides basic informations and an overview about genome-wide eQTL parameters.
plot.genome(seed10.cleaned, BaySha.array, ATH.coord, BSpgmap, chr.size=c(30432457, 19704536, 23470536, 18584924, 26991304), save.pict=TRUE);
The parameter chr.size
is the size of the chromosomes in base pair (here A. thaliana). These sizes are used to delimit the chromosomes for genome-wide graphs. For this function, the page setting has already been specified. save.pict=TRUE
will save all graphs in different files within the current folder.
Use the function cim.peak
to systematically perform a composite interval mapping by running a single genome scan scanone
with previously defined QTL as additives covariates. The additive covariates are defined from a peak
object as the closest flanking marker of LOD peaks with the function map.peak
. cim.peak
returns an object of class scanone
and therefore could be analyzed by the define.peak
function. Then, the results can be analyzed by calc.adef
, localize.qtl
, calc.Rsq
, etc... Due to the model, the LOD curve present a high (artefactual) LOD peak at the additive covariates locations which will be wrongly detected as a strong QTL by the function define.peak
. To avoid that, use wash.covar
function which will set the LOD score at the covariates location to 0 LOD. This function take care of a genetic window size which specifies the size of the region to "wash".
BaySha.cem <- cim.peak(seed10.cleaned,BaySha.peak)
covar <- map.peak(BaySha.peak)
covar
my_washed_BaySha.cem <- wash.covar(BaySha.cem, covar, window.size=20)
BayShacim.peak <- define.peak(BaySha.em, lodcolumn='all')
BayShacim.peak <- calc.adef(cross=seed10.cleaned, scanone=my_washed_BaySha.cem,
peak=BayShacim.peak)
BayShacim.peak <- localize(cross=seed10.cleaned, peak=BayShacim.peak,
data.gmap=BSpgmap)
BayShacim.peak <- classify(cross=seed10.cleaned, peak=BayShacim.peak,
etrait.coord=ATH.coord,data.gmap=BSpgmap)
BayShacim.Rsq <- calc.Rsq(cross=seed10.cleaned, peak=BayShacim.peak)
plot.Rsq(BayShacim.Rsq)
BaySha.cim.array <- peak.2.array(BayShacim.peak)
BaySha.cim.array <- Rsq.2.array(BayShacim.Rsq,BayShacim.array)
enjoy ;o)
Hamid A Khalili, hamid.khalili@gmail.com