GSA.func {GSA}R Documentation

Gene set analysis without permutations

Description

Determines the significance of pre-defined sets of genes with respect to an outcome variable, such as a group indicator, quantitative variable or survival time. This is the basic function called by GSA.

Usage

GSA.func(x,y, genesets, genenames,geneset.names=NULL,
 method=c("maxmean","mean","absmean"), resp.type=c("Quantitative",
"Two class unpaired","Survival","Multiclass", "Two class paired"),
censoring.status=NULL,
 first.time = TRUE, return.gene.ind = TRUE, 
ngenes = NULL, gs.mat =NULL, gs.ind = NULL,
 catalog = NULL, catalog.unique =NULL, 
s0 = NULL, s0.perc = NULL, minsize = 15, maxsize= 500, restand = TRUE) 

Arguments

x Data x: p by n matrix of features, one observation per column (missing values allowed)
y
genesets Gene set collection (a list)
genenames Vector of genenames in expression dataset
geneset.names Optional vector of gene set names
method Method for summarizing a gene set: "maxmean" (default), "mean" or "absmean"
resp.type Problem type: "quantitative" for a continuous parameter; "Two class unpaired" ; "Survival" for censored survival outcome; "Multiclass" : more than 2 groups; "Two class paired" for paired outcomes, coded -1,1 (first pair), -2,2 (second pair), etc
censoring.status Vector of censoring status values for survival problems, 1 mean death or failure, 0 means censored)
first.time internal use
return.gene.ind internal use
ngenes internal use
gs.mat internal use
gs.ind internal use
catalog internal use
catalog.unique internal use
s0 Exchangeability factor for denominator of test statistic; Default is automatic choice
s0.perc Percentile of standard deviation values to use for s0; default is automatic choice; -1 means s0=0 (different from s0.perc=0, meaning s0=zeroeth percentile of standard deviation values= min of sd values
minsize Minimum number of genes in genesets to be considered
maxsize Maximum number of genes in genesets to be considered
restand Should restandardization be done? Default TRUE

Details

Carries out a Gene set analysis, computing the gene set scores. This function does not do any permutations for estimation of false discovery rates. GSA calls this function to estimate FDRs.

Value

A list with components

scores Gene set scores for each gene set
norm.scores Gene set scores transformed by the inverse Gaussian cdf
mean Means of gene expression values for each sample
sd Standard deviation of gene expression values for each sample
gene.ind List indicating whch genes in each positive gene set had positive individual scores, and similarly for negative gene sets
geneset.names Names of the gene sets
nperms Number of permutations used
gene.scores Individual gene scores (eg t-statistics for two class problem)
s0 Computed exchangeability factor
s0.perc Computed percentile of standard deviation values
stand.info Information computed used in the restandardization process
method Method used (from call to GSA.func)
call The call to GSA

Author(s)

Robert Tibshirani

References

Efron, B. and Tibshirani, R. On testing the significance of sets of genes. Stanford tech report rep 2006. http://www-stat.stanford.edu/~tibs/ftp/GSA.pdf

Examples


######### two class unpaired comparison
# y must take values 1,2

set.seed(100)
x<-matrix(rnorm(1000*20),ncol=20)
dd<-sample(1:1000,size=100)

u<-matrix(2*rnorm(100),ncol=10,nrow=100)
x[dd,11:20]<-x[dd,11:20]+u
y<-c(rep(1,10),rep(2,10))

genenames=paste("g",1:1000,sep="")

#create some random gene sets
genesets=vector("list",50)
for(i in 1:50){
 genesets[[i]]=paste("g",sample(1:1000,size=30),sep="")
}
geneset.names=paste("set",as.character(1:50),sep="")

GSA.func.obj<-GSA.func(x,y, genenames=genenames, genesets=genesets,  resp.type="Two class unpaired")



#to use  "real" gene set collection, we read it in from a gmt file:
# 
# geneset.obj<- GSA.read.gmt("file.gmt")
# 
# where file.gmt is a gene set collection from GSEA collection or
#  or the website http://www-stat.stanford.edu/~tibs/GSA, or one
# that you have created yourself. Then

#   GSA.func.obj<-GSA.func(x,y, genenames=genenames, genesets=geneset.obj$genesets,  resp.type="Two class unpaired")
#
#



[Package GSA version 1.0 Index]