lung73 {scaleboot} | R Documentation |
Bootstrapping hierarchical clustering of the DNA microarray data set of 73 lung tissue samples each containing 916 observed genes.
data(lung73) lung73.pvclust lung73.sb
lung73.pvclust
is an object of class "pvclust"
defined in pvclust of Suzuki and Shimodaira (2006).
lung73.sb
is an object of class "scalebootv"
of length
72.
The microarray dataset of Garber et al. (2001) is reanalyzed in Suzuki
and Shimodaira (2006), and is found in data(lung)
of
the pvclust package. We reanalyze it, again, by the script shown in
Examples. The result of pvclust
is stored in
lung73.pvclust
, and model fitting to bootstrap probabilities
by the scaleboot package
is stored in lung73.sb
.
The AU p-values obtained by using the scaleboot package
are sometimes very different from those obtained by the pvclust
package. For example, pvclust
with default parameter value gave
AU p-value of 0.70 for Edge-67, but the
sbfit
gives AU p-value (named "k.3") of 0.95 for the same
edge. Note that the raw bootstrap probability (i.e., the ordinary bootstrap
probability with scale=1) is 0.04.
The AU p-values for all nodes are shown by the summary
method,
> summary(lung73.sb[60:70]) Corrected P-values (percent): raw k.1 k.2 k.3 model aic 60 20.21 (0.40) 20.29 (0.18) 71.40 (0.20) 78.98 (0.44) sing.3 80.46 61 58.45 (0.49) 55.08 (0.17) 63.15 (0.24) 56.34 (0.38) poly.3 575.85 62 95.68 (0.20) 95.92 (0.10) 98.64 (0.10) 98.61 (0.12) poly.3 -12.01 63 58.31 (0.49) 57.30 (0.17) 82.09 (0.20) 81.74 (0.28) poly.3 20.74 64 15.81 (0.36) 15.58 (0.16) 75.36 (0.21) 84.86 (0.37) sing.3 71.47 65 2.96 (0.17) 2.80 (0.07) 76.73 (0.51) 94.88 (0.20) sing.3 33.34 66 15.75 (0.36) 15.92 (0.16) 78.02 (0.20) 87.98 (0.29) sing.3 7.30 67 3.63 (0.19) 3.31 (0.07) 77.02 (0.47) 95.10 (0.17) sing.3 25.11 68 26.20 (0.44) 27.06 (0.17) 83.06 (0.18) 84.90 (0.27) poly.3 8.67 69 29.49 (0.46) 29.65 (0.17) 75.37 (0.22) 75.83 (0.34) poly.3 -14.09 70 28.31 (0.45) 29.04 (0.19) 76.62 (0.17) 81.54 (0.37) sing.3 0.99
Shown above are four types of p-values as well as selected model and AIC
values. "raw" is
the ordinary bootstrap probability, "k.1" is equivalent to "raw" but
calculated from the multiscale bootstrap, "k.2" is equivalent to the
third-order AU p-value of CONSEL, and finally "k.3" is an improved
version of AU p-value. By default, we use "k.3" when copying back the
p-values to an object of class "pvclust"
.
See Examples below for details.
The microarray
dataset is not included in data(lung73)
, but it is found in
data(lung)
of the pvclust package.
Garber, M. E. et al. (2001) Diversity of gene expression in adenocarcinoma of the lung, Proceedings of the National Academy of Sciences, 98, 13784-13789 (dataset is available from http://genome-www.stanford.edu/lung_cancer/adeno/).
Suzuki, R. and Shimodaira, H. (2006). pvclust: An R package for hierarchical clustering with p-values, Bioinformatics, 22, 1540-1542 (software is available from CRAN or http://www.is.titech.ac.jp/~shimo/prog/pvclust/).
## Not run: ## script to create lung73.pvclust and lung73.sb ## multiscale bootstrap resampling of hierarchical clustering library(pvclust) data(lung) sa <- 9^seq(-1,1,length=13) # wider range of scales than pvclust default lung73.pvclust <- pvclust(lung,r=1/sa,nboot=10000) lung73.sb <- sbfit(lung73.pvclust) # model fitting ## End(Not run) ## Not run: ## Parallel version of the above script ## parPvclust took 80 mins using 40 cpu's library(snow) library(pvclust) data(lung) cl <- makeCluster(40) # launch 40 cpu's sa <- 9^seq(-1,1,length=13) # wider range of scales than pvclust default lung73.pvclust <- parPvclust(cl,lung,r=1/sa,nboot=10000) lung73.sb <- sbfit(lung73.pvclust,cluster=cl) # model fitting ## End(Not run) ## replace au/bp entries in pvclust object data(lung73) lung73.new <- sbpvclust(lung73.pvclust,lung73.sb) # au <- k.3 ## Not run: library(pvclust) plot(lung73.new) # draw dendrogram with the new au/bp values pvrect(lung73.new) ## End(Not run) ## diagnose edges 61,...,69 lung73.sb[61:69] # print fitting details plot(lung73.sb[61:69]) # plot curve fitting summary(lung73.sb[61:69]) # print au p-values ## diagnose edge 67 lung73.sb[[67]] # print fitting plot(lung73.sb[[67]],legend="topleft") # plot curve fitting summary(lung73.sb[[67]]) # print au p-values