samr {samr} | R Documentation |
Correlates a large number of features (eg genes) with an outcome variable, such as a group indicator, quantitative variable or survival time
samr(data, resp.type=c("Quantitative","Two class unpaired","Survival","Multiclass","One class", "Two class paired","Two class unpaired timecourse", "One class timecourse","Two class paired timecourse", "Pattern discovery"), s0=NULL, s0.perc=NULL, nperms=100, center.arrays=FALSE, testStatistic=c("standard","wilcoxon"), time.summary.type=c("slope","signed.area"), regression.method=c("standard","ranks"), return.x=FALSE, knn.neighbors=10, random.seed=NULL, xl.mode=c("regular","firsttime","next20","lasttime"), xl.time=NULL, xl.prevfit=NULL)
data |
Data object with components x- p by n matrix of features, one observation per column (missing values allowed); y- n-vector of outcome measurements; censoring.status- n-vector of censoring censoring.status (1= died or event occurred, 0=survived, or event was censored), needed for a censored survival outcome |
resp.type |
Problem type: "quantitative" for a continuous parameter; "Two class unpaired" ; "Survival" for censored survival outcome; "Multiclass" : more than 2 groups; "One class" for a single group; "Two class paired" for two classes with paired observations; "Two class unpaired timecourse", "One class time course", "Two class.paired timecourse" or "Pattern discovery" |
s0 |
Exchangeability factor for denominator of test statistic; Default is automatic choice |
s0.perc |
Percentile of standard deviation values to use for s0; default is automatic choice; -1 means s0=0 (different from s0.perc=0, meaning s0=zeroeth percentile of standard deviation values= min of sd values |
nperms |
Number of permutations used to estimate false discovery rates |
center.arrays |
Should the data for each sample (array) be median centered at the outset? Default =FALSE |
testStatistic |
Test statistic to use in two class unpaired case.Either "standard" (t-statistic) or ,"wilcoxon" (Two-sample wilcoxon or Mann-Whitney test) |
time.summary.type |
Summary measure for each time course: "slope", or "signed.area"), |
regression.method |
Regression method for quantitative case: "standard", (linear least squares) or "ranks" (linear least squares on ranked data) |
return.x |
Should the matrix of feature values be returned? Only useful for time course data, where x contains summaries of the features over time. Otherwise x is the same as the input data data$x |
knn.neighbors |
Number of nearest neighbors to use for imputation of missing features values |
random.seed |
Optional initial seed for random number generator (integer) |
xl.mode |
Used by Excel interface |
xl.time |
Used by Excel interface |
xl.prevfit |
Used by Excel interface |
Carries out a SAM analysis. Applicable to microarray data and other data with a large number of features. This is the R package that is called by the "official" SAM Excel package v2.0. The format of the response vector y and the calling sequence is illustrated in the examples below. A more complete description is given in the SAM manual at http://www-stat.stanford.edu/~tibs/SAM
A list with components
n |
Number of observations |
x |
Data matrix p by n (p=# genes or features). Equal to the matrix data$x in the original call to samr except for (1) time course analysis, where is contains the summarized data or (2) quantitative outcome with rank regression, where it contains the data transformed to ranks. Hence it is null except for in time course analysis. |
y |
|
argy |
The values data$y in the original call to samr |
censoring.status |
Censoring status indicators if applicable |
testStatistic |
Test Statistic ised |
nperms |
Number of permutations requested |
nperms.act |
Number of permutations actually used. Will be < nperms when # of possible permutations <= nperms (in which case all permutations are done) |
tt |
tt=numer/sd, the vector of p test statistics for original data |
numer |
Numerators for tt |
sd |
Denominators for tt. Equal to standard deviation for feature plus s0 |
s0 |
Computed exchangeability factor |
s0.perc |
Computed percentile of standard deviation values. s0= s0.perc percentile of the gene standard deviations |
eva |
p-vector of expected values for tt under permutation sampling |
perms |
nperms.act by n matrix of permutations used. Each row is a permutation of 1,2...n |
permsy |
nperms.act by n matrix of permutations used. Each row is a permutation of y1,y2,...yn. Only one of perms or permys is non-Null, depending on resp.type |
all.perms.flag |
Were all possible permutations used? |
ttstar |
p by nperms.aca matrix t of test statistics from permuted data. Each column if sorted in descending order |
ttstar0 |
p by nperms.act matrix of test statistics from permuted data. Columns are in order of data |
eigengene.number |
The number of the eigengene (eg 1,2,..) that was requested for Pattern discovery |
eigengene |
Computed eigengene |
pi0 |
Estimated proportion of non-null features (genes) |
foldchange |
p-vector of foldchanges for original data |
foldchange.star |
p by nperms.act matrix estimated foldchanges from permuted data |
sdstar.keep |
n by nperms.act matrix of standard deviations from each permutation |
censoring.status.star.keep |
n by nperms.act matrix of censoring.status indicators from each permutation |
resp.type |
The response type used. Same as resp.type.arg, except for time course data, where time data is summarized and then treated as non-time course. Eg if resp.type.arg="oneclass.timecourse" then resp.type="oneclass" |
resp.type.arg |
The response type requested in the call to samr |
stand.contrasts |
For multiclass data, p by nclass matrix of standardized differences between the class mean and the overall mean |
stand.contrasts.star |
For multiclass data, p by nclass by nperms.act array of standardized contrasts for permuted datasets |
stand.contrasts.95 |
For multiclass data, 2.5 of standardized contrasts. Useful for determining which class contrast for significant genes, are large |
call |
calling sequence |
Balasubrimanian Narasimhan and Robert Tibshirani
Tusher, V., Tibshirani, R. and Chu, G. (2001): Significance analysis of microarrays applied to the ionizing radiation response PNAS 2001 98: 5116-5121, (Apr 24). http://www-stat.stanford.edu/~tibs/SAM
######### two class unpaired comparison # y must take values 1,2 set.seed(100) x<-matrix(rnorm(1000*20),ncol=20) dd<-sample(1:1000,size=100) u<-matrix(2*rnorm(100),ncol=10,nrow=100) x[dd,11:20]<-x[dd,11:20]+u y<-c(rep(1,10),rep(2,10)) data=list(x=x,y=y, geneid=as.character(1:nrow(x)),genenames=paste("g",as.character(1:nrow(x)),sep=""), logged2=TRUE) samr.obj<-samr(data, resp.type="Two class unpaired", nperms=100) delta=.4 samr.plot(samr.obj,delta) delta.table <- samr.compute.delta.table(samr.obj) siggenes.table<-samr.compute.siggenes.table(samr.obj,delta, data, delta.table) ########### two class paired # y must take values -1, 1, -2,2 etc, with (-k,k) being a pair set.seed(100) x<-matrix(rnorm(1000*20),ncol=20) dd<-sample(1:1000,size=100) u<-matrix(2*rnorm(100),ncol=10,nrow=100) x[dd,11:20]<-x[dd,11:20]+u y=c(-(1:10),1:10) d=list(x=x,y=y, geneid=as.character(1:nrow(x)),genenames=paste("g",as.character(1:nrow(x)),sep=""), logged2=TRUE) samr.obj<-samr(d, resp.type="Two class paired", nperms=100) #############quantitative response # y must take numeric values set.seed(84048) x=matrix(rnorm(1000*9),ncol=9) mu=c(3,2,1,0,0,0,1,2,3) b=runif(100)+.5 x[1:100,]=x[1:100,]+ b y=mu d=list(x=x,y=y, geneid=as.character(1:nrow(x)),genenames=paste("gene", as.character(1:nrow(x)))) samr.obj =samr(d, resp.type="Quantitative", nperms=50) ########### oneclass # y is a vector of ones set.seed(100) x<-matrix(rnorm(1000*20),ncol=20) dd<-sample(1:1000,size=100) u<-matrix(2*rnorm(100),ncol=10,nrow=100) x[dd,11:20]<-x[dd,11:20]+u y<-c(rep(1,20)) data=list(x=x,y=y, geneid=as.character(1:nrow(x)),genenames=paste("g",as.character(1:nrow(x)),sep=""), logged2=TRUE) samr.obj<-samr(data, resp.type="One class", nperms=100) ###########survival data # y is numeric; censoring.status=1 for failures, and 0 for censored set.seed(84048) x=matrix(rnorm(1000*50),ncol=50) x[1:50,26:50]= x[1:50,26:50]+2 x[51:100,26:50]= x[51:100,26:50]-2 y=abs(rnorm(50)) y[26:50]=y[26:50]+2 censoring.status=sample(c(0,1),size=50,replace=TRUE) d=list(x=x,y=y,censoring.status=censoring.status, geneid=as.character(1:1000),genenames=paste("gene", as.character(1:1000))) samr.obj=samr(d, resp.type="Survival", nperms=20) ################multi-class example # y takes values 1,2,3,...k where k= number of classes set.seed(84048) x=matrix(rnorm(1000*10),ncol=10) x[1:50,6:10]= x[1:50,6:10]+2 x[51:100,6:10]= x[51:100,6:10]-2 y=c(rep(1,3),rep(2,3),rep(3,4)) d=list(x=x,y=y,geneid=as.character(1:1000),genenames=paste("gene", as.character(1:1000))) samr.obj <- samr(d, resp.type="Multiclass") #################### timecourse data # elements of y are of the form kTimet where k is the class label and t # is the time; in addition, the suffixes Start or End indicate the first # and last observation in a given time course # the class label can be that for a two class unpaired, one class or # two class paired problem set.seed(8332) y=paste(c(rep(1,15),rep(2,15)),"Time",rep(c(1,2,3,4,5,1.1,2.5, 3.7, 4.1,5.5),3),sep="") start=c(1,6,11,16,21,26) for(i in start){ y[i]=paste(y[i],"Start",sep="") } for(i in start+4){ y[i]=paste(y[i],"End",sep="") } x=matrix(rnorm(1000*30),ncol=30) x[1:50,16:20]=x[1:50,16:20]+matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE) x[1:50,21:25]=x[1:50,21:25]+matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE) x[1:50,26:30]=x[1:50,26:30]+matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE) x[51:100,16:20]=x[51:100,16:20]-matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE) x[51:100,21:25]=x[51:100,21:25]-matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE) x[51:100,26:30]=x[51:100,26:30]-matrix(3*c(0,1,2,3,4),ncol=5,nrow=50,byrow=TRUE) data=list(x=x,y=y, geneid=as.character(1:nrow(x)),genenames=paste("g",as.character(1:nrow(x)),sep=""), logged2=TRUE) samr.obj<- samr(data, resp.type="Two class unpaired timecourse", nperms=100, time.summary.type="slope") ##################### pattern discovery # here there is no outcome y; the desired eigengene is indicated by # the argument eigengene.numbe in the data object set.seed(32) x=matrix(rnorm(1000*9),ncol=9) mu=c(3,2,1,0,0,0,1,2,3) b=3*runif(100)+.5 x[1:100,]=x[1:100,]+ b d=list(x=x,eigengene.number=1, geneid=as.character(1:nrow(x)),genenames=paste("gene", as.character(1:nrow(x)))) samr.obj=samr(d, resp.type="Pattern discovery", nperms=50)