A new starting point {eqtl}R Documentation

Introductory comments on R/eqtl

Description

A brief introduction to the R/eqtl package, with a walk-through of a typical analysis.

Preliminaries to R/eqtl

Walk-through of a typical analysis

Here we briefly describe the use of R/eqtl and R/qtl to analyze an experimental cross. R/eqtl is an add-on package to Karl Broman's R/qtl. It requires the 'qtl' package and uses some of its functions. Therefore this tutorial takes in consideration prior knowledge of R/qtl. You must read the R/qtl documentation and tutorial before to perform any analysis with the 'eqtl' add-on.

A difficult first step in the use of most data-analysis software is to import the data in adequate format. This step is perfectly described in R/qtl tutorial. With R/eqtl you should import some extra data in addition to the data needed for R/qtl. We won't discuss about data import at this point. This step is described in the next chapter 'First step'.

We consider the example data seed10, an experiment on gene expression in Arabidopsis thaliana. Use the data function to load the data.

data(seed10)

seed10 data have class cross and riself. It describes an experiment on a RIL population obtained by single seed descent. The function summary.cross gives summary information on the data, and checks the data for internal consistency. A lot of utility functions are available in 'qtl' and are widely described in Karl's tutorial.

To project our results on the physical map, we also need to load the physical position of the genetic markers and the genomic physical coordinates of the probes used to estimate expression traits described in seed10. For information, BSpgmap and ATH.coord are simple data frame with specific column names.

data(BSpgmap)
names(BSpgmap)
data(ATH.coord)
names(ATH.coord)

Before running the QTL analysis, intermediate calculations need to be performed. The function calc.genoprob is used to compute the conditional probabilities at each pseudo-marker. sim.geno simulates sequences of genotypes from their joint probabilities. See 'qtl' manual for details. These steps have been already performed on seed10 and you may not need to run them again. Here, pseudo-markers have been defined every 0.5 centimorgan ( step=0.5 ).

seed10 <- calc.genoprob(seed10, step=0.5, off.end=0, error.prob=0,
map.function='kosambi', stepwidth='fixed')
seed10 <- sim.geno(seed10, step=0.5, off.end=0, error.prob=0,
map.function='kosambi', stepwidth='fixed')

Use the scanone function to perform an interval mapping. BaySha.em <- scanone(seed10,method='em',pheno.col=1:nphe(seed10),model='normal')

The microarray probes usually contains data for which we don't want to perform any QTL analysis like the buffers, the controls or some missed probes. The function clean.phe cleans the seed10 and/or the BaySha.em data for undesired phenotypes.

seed10.cleaned <- clean.phe(seed10,"Buffer")
seed10.cleaned <- clean.phe(seed10,"Ctrl")
BaySha.em <- clean.phe(BaySha.em,"Buffer")
BaySha.em <- clean.phe(BaySha.em,"Ctrl")

In this example, dropped data comes from probes named "Buffer" and "Ctrl" found within CATMA data. Note that one could a priori clean the seed10 data before computing the interval mapping. The scanone object will be directly generated clean.

One of the major problematic step for genome-wide expression QTL analysis, is to read all the LOD curves and sytematically define the QTLs. Because of the amount of results, it is not feasible to read by eyes all the LOD curves. Use define.peak function to define QTL with drop LOD support interval from the scanone results, here the interval mapping results BaySha.em.

BaySha.peak <- define.peak(BaySha.em,locdolumn='all')
class(BaySha.peak)

The parameter lodcolumn='all' specifies to analyse all LOD columns (all the traits) of the scanone object BaySha.em. lodcolumn='CATrck' specifies to analyse the scanone LOD column CATrck only, which is supposed to be the interval mapping result of the trait CATrck.

We call peak object, the results of the define.peak function. The peak object is used to store the QTL definition. The QTL are defined by several features decribed in the peak objects attributes. At this step, a QTL is only defined by its LOD score, its location, the subjective quality of the LOD peak. See define.peak function for details.

attributes(BaySha.peak)

Back to the define.peak parameters. graph=TRUE specifies to draw the LOD curves with LOD support interval. The curves showing a QTL detected will be drawn on different charts for each chromosome. Note that, no graphical setup has been defined and therefore all graphs generated will appear one above the others. You should specify the graphical parameter mfrow of the R function par() before running define.peak to draw all charts in the same window. You may not want to set the parameters graph=TRUE and lodcolumn='all' at the same time, depending on the amount of traits analyzed.

The following command lines gives an example to define QTL and draw chart for a unique trait CATrck.

png(filename='CATrck.png',width=800,height=600)
par(mfrow=c(1,5))
define.peak(BaySha.im, lodcolumn='CATrck', graph=TRUE, chr=c(1,5))
par(mfrow=c(1,1))
dev.off();

png() and dev.off() are classical R functions which indicates here to print the graph generated as a png file 'CATrck.png'. By using these functions, you can page set the graph as you wanted. Differently, the define.peak function parameter's, save.pict=TRUE, will systematically save all single LOD curves generated for each chromosome as png files. The files generated will be named with the names of the trait and the chromosomes where the QTLs are located. So beware to the amount of data you're analysing before setting the parameters save.pict=TRUE.

The way to access QTL results within peak object is quite simple:

BaySha.peak
BaySha.peak$CATrck

BaySha.peak will give you the define.peak results ordered by trait and chromosomes, respectively. BaySha.peak$CATrck will give you the results for the trait 'CATrck' and so on for other trait names. If no QTL had been detected for a trait, the result will be the value NA.

To complete the QTL analysis, use the functions calc.adef, localize.qtl and classify.qtl to compute, for each QTL previously detected in peak object, the additive effect, the estimated physical location and the estimated acting-type in case of eQTL, respectively. All of these functions will add peak features to the peak object.

BaySha.peak <- localize.qtl(cross=seed10.cleaned, peak=BaySha.peak,
data.gmap=BSpgmap)
BaySha.peak <- calc.adef(cross=seed10.cleaned, scanone=BaySha.em,
peak=BaySha.peak)
BaySha.peak <- classify.qtl(cross=seed10.cleaned, peak=BaySha.peak,
etrait.coord=ATH.coord, data.gmap=BSpgmap)
attributes(BaySha?peak)

For each of these functions you have to specify the peak object. You also need to specify the related cross object and scanone results, the related genetic map physical data BSpgmap and the expression traits physical data ATH.coord. Note that, the expression trait physical data (here ATH.coord) may contain more traits than those studied. Conversely, all traits studied within the peak, the scanone or the cross objects must be described in ATH.coord.

Use calc.Rsq function to compute, from a peak object, the contribution of the individual QTLs to the phenotypic variation. At the same time this function tests and computes the contribution of significant epistatic interactions between QTLs. By default the significant threshold is set to th=0.001. In case you wanted to take all QTL interactions whatever the significance, you must set th=1.

BaySha.Rsq <- calc.Rsq(cross=seed10.cleaned,peak=BaySha.peak)
BaySha.Rsq
plot.Rsq(rsq=BaySha.Rsq)

The function peak.2.array will format all QTL results in a simple array. The column names are the names of the peak features described in peak object. This array have class peak.array. Rsq.2.array add the R square column to the QTL array. Formating the results as a simple array allows to use all basic and complex R functions (statistical, summary, graphical, histograms...) to study the results customly and in the simplest way. This format also allows to write the results in a file (like text or CSV) to save out the data.

BaySha.array <- peak.2.array(BaySha.peak);
BaySha.array <- Rsq.2.array(rsq=BaySha.Rsq,BaySha.array);

'eqtl' provides also useful functions to get an overview of the QTLs results stored in peak.array: The summary.peak function gives a variety of summary information and an overview of peak distribution. Summary graphs are available by setting graph=TRUE. Like define.peak, no graphical parameters had been setted and therefore all graphs generated will appear one above the others in the same R graph window. You may define mfrow before running summary.peak to draw all charts in the same R window.

Whole QTL summary with graphs:

par(mfrow=c(2,4))
BaySha.summary <- summary.peak(peak.2.array,seed10.cleaned,graph=TRUE)
par(mfrow=c(1,1))
names(BaySha.summary)
BaySha.summary

QTL summary with graphs excluding QTL localized on the chromosome 3 between 5000 and 6000 bp:

par(mfrow=c(2,4))
BaySha.sum_exc <- summary.peak( BaySha.array, seed10.cleaned,
exc=data.frame(inf=5000, sup=6000, chr=3), graph=TRUE)
par(mfrow=c(1,1))
names(BaySha.sum_exc)
BaySha.sum_exc

The function plot.genome provides basic informations and an overview about genome-wide eQTL parameters.

plot.genome(seed10.cleaned, BaySha.array, ATH.coord, BSpgmap, chr.size=c(30432457, 19704536, 23470536, 18584924, 26991304), save.pict=TRUE);

The parameter chr.size is the size of the chromosomes in base pair (here A. thaliana). These sizes are used to delimit the chromosomes for genome-wide graphs. For this function, the page setting has already been specified. save.pict=TRUE will save all graphs in different files within the current folder.

Use the function cim.peak to systematically perform a composite interval mapping by running a single genome scan scanone with previously defined QTL as additives covariates. The additive covariates are defined from a peak object as the closest flanking marker of LOD peaks with the function map.peak. cim.peak returns an object of class scanone and therefore could be analyzed by the define.peak function. Then, the results can be analyzed by calc.adef, localize.qtl, calc.Rsq, etc... Due to the model, the LOD curve present a high (artefactual) LOD peak at the additive covariates locations which will be wrongly detected as a strong QTL by the function define.peak. To avoid that, use wash.covar function which will set the LOD score at the covariates location to 0 LOD. This function take care of a genetic window size which specifies the size of the region to "wash".

BaySha.cem <- cim.peak(seed10.cleaned,BaySha.peak)

covar <- map.peak(BaySha.peak)
covar

my_washed_BaySha.cem <- wash.covar(BaySha.cem, covar, window.size=20)
BayShacim.peak <- define.peak(BaySha.em, lodcolumn='all')
BayShacim.peak <- calc.adef(cross=seed10.cleaned, scanone=my_washed_BaySha.cem,
peak=BayShacim.peak)
BayShacim.peak <- localize(cross=seed10.cleaned, peak=BayShacim.peak,
data.gmap=BSpgmap)
BayShacim.peak <- classify(cross=seed10.cleaned, peak=BayShacim.peak,
etrait.coord=ATH.coord,data.gmap=BSpgmap)
BayShacim.Rsq <- calc.Rsq(cross=seed10.cleaned, peak=BayShacim.peak)
plot.Rsq(BayShacim.Rsq)
BaySha.cim.array <- peak.2.array(BayShacim.peak)
BaySha.cim.array <- Rsq.2.array(BayShacim.Rsq,BayShacim.array)

enjoy ;o)

Author(s)

Hamid A Khalili, hamid.khalili@gmail.com


[Package eqtl version 1.0 Index]