lung73 {scaleboot}R Documentation

Clustering of 73 Lung Tumors

Description

Bootstrapping hierarchical clustering of the DNA microarray data set of 73 lung tissue samples each containing 916 observed genes.

Usage

data(lung73)

lung73.pvclust

lung73.sb

Format

lung73.pvclust is an object of class "pvclust" defined in pvclust of Suzuki and Shimodaira (2006).

lung73.sb is an object of class "scalebootv" of length 72.

Details

The microarray dataset of Garber et al. (2001) is reanalyzed in Suzuki and Shimodaira (2006), and is found in data(lung) of the pvclust package. We reanalyze it, again, by the script shown in Examples. The result of pvclust is stored in lung73.pvclust, and model fitting to bootstrap probabilities by the scaleboot package is stored in lung73.sb. The AU p-values obtained by using the scaleboot package are sometimes very different from those obtained by the pvclust package. For example, pvclust with default parameter value gave AU p-value of 0.70 for Edge-67, but the sbfit gives AU p-value (named "k.3") of 0.95 for the same edge. Note that the raw bootstrap probability (i.e., the ordinary bootstrap probability with scale=1) is 0.04.

The AU p-values for all nodes are shown by the summary method,

> summary(lung73.sb[60:70])

Corrected P-values (percent):
   raw          k.1          k.2          k.3          model  aic    
60 20.21 (0.40) 20.29 (0.18) 71.40 (0.20) 78.98 (0.44) sing.3  80.46 
61 58.45 (0.49) 55.08 (0.17) 63.15 (0.24) 56.34 (0.38) poly.3 575.85 
62 95.68 (0.20) 95.92 (0.10) 98.64 (0.10) 98.61 (0.12) poly.3 -12.01 
63 58.31 (0.49) 57.30 (0.17) 82.09 (0.20) 81.74 (0.28) poly.3  20.74 
64 15.81 (0.36) 15.58 (0.16) 75.36 (0.21) 84.86 (0.37) sing.3  71.47 
65  2.96 (0.17)  2.80 (0.07) 76.73 (0.51) 94.88 (0.20) sing.3  33.34 
66 15.75 (0.36) 15.92 (0.16) 78.02 (0.20) 87.98 (0.29) sing.3   7.30 
67  3.63 (0.19)  3.31 (0.07) 77.02 (0.47) 95.10 (0.17) sing.3  25.11 
68 26.20 (0.44) 27.06 (0.17) 83.06 (0.18) 84.90 (0.27) poly.3   8.67 
69 29.49 (0.46) 29.65 (0.17) 75.37 (0.22) 75.83 (0.34) poly.3 -14.09 
70 28.31 (0.45) 29.04 (0.19) 76.62 (0.17) 81.54 (0.37) sing.3   0.99

Shown above are four types of p-values as well as selected model and AIC values. "raw" is the ordinary bootstrap probability, "k.1" is equivalent to "raw" but calculated from the multiscale bootstrap, "k.2" is equivalent to the third-order AU p-value of CONSEL, and finally "k.3" is an improved version of AU p-value. By default, we use "k.3" when copying back the p-values to an object of class "pvclust".

See Examples below for details.

Note

The microarray dataset is not included in data(lung73), but it is found in data(lung) of the pvclust package.

Source

Garber, M. E. et al. (2001) Diversity of gene expression in adenocarcinoma of the lung, Proceedings of the National Academy of Sciences, 98, 13784-13789 (dataset is available from http://genome-www.stanford.edu/lung_cancer/adeno/).

References

Suzuki, R. and Shimodaira, H. (2006). pvclust: An R package for hierarchical clustering with p-values, Bioinformatics, 22, 1540-1542 (software is available from CRAN or http://www.is.titech.ac.jp/~shimo/prog/pvclust/).

See Also

sbpvclust, sbfit.pvclust

Examples

## Not run: 
## script to create lung73.pvclust and lung73.sb
## multiscale bootstrap resampling of hierarchical clustering
library(pvclust)
data(lung)
sa <- 9^seq(-1,1,length=13) # wider range of scales than pvclust default
lung73.pvclust <- pvclust(lung,r=1/sa,nboot=10000) 
lung73.sb <- sbfit(lung73.pvclust) # model fitting
## End(Not run)

## Not run: 
## Parallel version of the above script
## parPvclust took 80 mins using 40 cpu's
library(snow)
library(pvclust)
data(lung)
cl <- makeCluster(40) # launch 40 cpu's
sa <- 9^seq(-1,1,length=13) # wider range of scales than pvclust default
lung73.pvclust <- parPvclust(cl,lung,r=1/sa,nboot=10000) 
lung73.sb <- sbfit(lung73.pvclust,cluster=cl) # model fitting
## End(Not run)

## replace au/bp entries in pvclust object
data(lung73)
lung73.new <- sbpvclust(lung73.pvclust,lung73.sb) # au <- k.3

## Not run: 
library(pvclust)
plot(lung73.new) # draw dendrogram with the new au/bp values
pvrect(lung73.new)
## End(Not run)

## diagnose edges 61,...,69
lung73.sb[61:69] # print fitting details
plot(lung73.sb[61:69]) # plot curve fitting
summary(lung73.sb[61:69]) # print au p-values
## diagnose edge 67
lung73.sb[[67]] # print fitting
plot(lung73.sb[[67]],legend="topleft") # plot curve fitting
summary(lung73.sb[[67]]) # print au p-values


[Package scaleboot version 0.3-2 Index]