ccems-package {ccems} | R Documentation |
This package performs model selections of equilibriums in general and quasi-equilibriums of enzyme complexes in particular. Estimates of dissociation constants K that best describe a dataset are found by systematically scanning though all possibilities of K being infinity and/or plausibly equal to other K. The automatically generated space of models is then fitted to data. Automation enables searches of spaces too large to be specified by hand, e.g. spaces generated by combinatorially complex equilibriums.
Package: | ccems |
Type: | Package |
Depends: | odesolve,snow |
Suggests: | nws |
License: | GPL-2 |
LazyLoad: | yes |
LazyData: | yes |
URL: | http://epbi-radivot.cwru.edu/ccems |
Index:
RNR Ribonucleotide Reductase Data TK1 Thymidine Kinase 1 Data ems Equilibrium Model Selection fitModel Fit Model mkGrids Make Grid Model Space mkKd2Kj Make Kd2Kj Mappings mkModel Make Specific Model mkSpurs Make Spur Model Space mkg Make Generic Model simulateData Simulate Data
This package automatically generates and fits biochemical equilibrium models using as outputs either average protein mass data or enzyme reaction rate data. It is currently limited to systems where one central hub protein mediates all of the interactions and total concentrations of the reactants are approximately known exactly, e.g. as in systems that were reconstituted from purified reactants. It is limited further in that multiple sites for the same ligand must be filled in a predetermined sequence.
Equilibriums can be specified by any acyclic spanning subgraph of its nodes, where edges are dissociation constants. Here, hub protein oligomerization is viewed as a curtain rod from which threads of ligand bound states/complexes hang: each notch down a thread corresponds to one additional ligand bound to the hub j-mer where j increases as one moves to the right on the curtain rod. At the top of each thread is a head-node that sits on the rod. The head nodes must be specified, as some j values may be absent and some ligand sites (other than the thread defining site) may be assumed to be saturated in some j-mers. The last node in each thread will be referred to as a tail node. If a ligand has more than one binding site, the tail of the thread of one site (other than the last one filled) is the head of the thread of the site filled next. Thus, head nodes must be stated only for the first site filled.
In the examples below, E is the concentration of thymidine kinase 1 (TK1) tetramers, S is thymidine,
t is dTTP, X is ATP and R is the large subunit of ribonucleotide reductase (RNR). The examples are
ordered by cpu consumption: the first takes ~0.5 min on 1 core, the second ~1.5 minutes on 2 cores, and
the third ~2 days on 16 cores. The first fits activity data to a single thread model. It
is the fastest example because it uses rational polynomials for
the system model because [E] is small enough that total [S] approximates free [S].
In the second example there is only one ligand binding
site (the s-site) and the hub protein forms at most a dimer.
Thus, the thread topology of the acyclic graph used (to explore K equality
hypotheses) has only two head nodes and two threads.
The head node of the monomer thread is the free hub protein R1t0 and
the head node of the dimer thread is the ligand free dimer R2t0.
As there is only one site, the s-site, there are only two threads, one for the monomer
and one for the dimer. Threads contain the names of
only their non-head nodes since their heads have already been specified.
This structure is assigned to topology
which is then passed to the function mkg
to produce a generic model object g
. Together with the data, this
generic model object is then passed to the function ems
(equilibrium model selection) which generates the
model space, fits it to the data, and returns the topN
(typically 5, 10 or 20) best (lowest AIC) models.
The third example is more complicated than the second because ATP has multiple R1 binding sites and because R also tetramerizes
and hexamerizes with increases in [ATP]. This problem motivated the development of this R package.
It is an example of a problem whose solution
is enabled by this software because its model space is too large to specify by hand. A linux cluster is needed to execute this example.
The user must have working directory write privileges so that the subdirectories
models
and results
can be created to hold model C code (generated
by mkg
) and html output (generated by ems
), respectively.
This work was supported by the National Cancer Institute (K25CA104791).
Tom Radivoyevitch (txr24@case.edu)
Radivoyevitch, T. (2008) Equilibrium model selection: dTTP induced R1 dimerization. BMC Systems Biology 2, 15.
Radivoyevitch, T. Automated model generation and analysis methods for combinatorially complex biochemical equilibriums. (submitted).
## LAPTOP EXAMPLE: Top 3 three parameter models of ## Berenstein et al. JBC 2000 TK1 data library(ccems) topology <- list( heads=c("E1S0"), #one E is a tetramer sites=list( c=list( # c-site = catyltic site t=c("E1S1","E1S2","E1S3","E1S4") ) ) ) g <- mkg(topology,activity=TRUE,TCC=FALSE) dd=subset(TK1,(year==2000),select=c(E,S,v)) # Berenstein et al names(dd)[1:2]= c("ET","ST") tops=ems(dd,g,maxTotalPs=3,kIC=30000) plot(dd$ST,dd$v,type="p",pch=1, xlab="[dT] (uM)", ylab="v", main="Top 3 TK1 Models with 3 parameters or less") lgx=log(dd$ST) upr=range(lgx)[2] lwr=range(lgx)[1] del=(upr-lwr)/50 fineX=exp(seq(lwr,upr,by=del)) newPnts <- data.frame(ET = rep(dd$ET[1],length(fineX)), ST = fineX) for (i in 1:3) { df <- simulateData(tops[[i]],predict=newPnts,typeYP="v")$predict lines(df$ST,df$EY,type="l",lty=i) } ## DESKTOP EXAMPLE: This example automatically creates (and fits) the model ## space of the BMC SB 2008 dTTP induced R1 dimerization reference above. library(ccems) topology <- list( heads=c("R1t0","R2t0"), sites=list( s=list( # s-site thread # m=c("R1t1"), # monomer 1 d=c("R2t1","R2t2") # dimer 2 ) ) ) g <- mkg(topology,TCC=TRUE) data(RNR) d1 <- subset(RNR,(year==2001)&(fg==1)&(G==0)&(t>0),select=c(R,t,m,year)) d2 <- subset(RNR,year==2006,select=c(R,t,m,year)) dd <- rbind(d1,d2) names(dd)[1:2] <- paste(strsplit(g$id,split="")[[1]],"T",sep="")#e.g. to form "RT" rownames(dd) <- 1:dim(dd)[1] # lose big number row names of parent dataframe ## top10=ems(dd,g,cpusPerHost=c("localhost"=2),maxTotalPs=2,ptype="SOCK") ## CLUSTER EXAMPLE: This ATP induced R1 hexamerization example runs 1.8 days ## on a 16 core (4 quad proc machines) ROCKS Linux cluster. library(ccems) topology <- list( heads=c("R1X0","R2X2","R4X4","R6X6"), sites=list( # s-sites are already filled only in (j>1)-mers a=list( #a-site thread m=c("R1X1"), # monomer 1 d=c("R2X3","R2X4"), # dimer 2 t=c("R4X5","R4X6","R4X7","R4X8"), # tetramer 3 h=c("R6X7","R6X8","R6X9","R6X10", "R6X11", "R6X12") # hexamer 4 ), # tails of a-site threads are heads of h-site threads h=list( # h-site m=c("R1X2"), # monomer 5 d=c("R2X5", "R2X6"), # dimer 6 t=c("R4X9", "R4X10","R4X11", "R4X12"), # tetramer 7 h=c("R6X13", "R6X14", "R6X15","R6X16", "R6X17", "R6X18")# hexamer 8 ) ) ) g=mkg(topology,TCC=TRUE) dd=subset(RNR,(year==2002)&(fg==1)&(X>0),select=c(R,X,m,year)) names(dd)[1:2] <- paste(strsplit(g$id,split="")[[1]],"T",sep="")#i.e. c("RT","XT") ## 29 choose 3(2) is 3654(406), so 3654 + 406 + 29 + 1 = 4090 spurs, but after ## subtracting those without at least one hexamer complex, and after adding ## grids, the total number of models is 3410. Of these 3406 converged, see below. ## Not run: cpusPerHost=c("localhost" = 4,"compute-0-0"=4,"compute-0-1"=4,"compute-0-2"=4) top10=ems(dd,g,cpusPerHost=cpusPerHost, maxTotalPs=3, ptype="SOCK",IC=100) # The following are the last few lines of the output. The first line shows that a # one parameter model is best(shown are best AICs of models with 0, 1, 2 or 3 # parameters). The next shows that it took 1.8 days on 16 cpus to fit 3406 models. # And the block that follows shows that the top 5 models are all spur graph models. # The html file RXglobSOCK.htm in the results directory contains this information # and more (e.g. parameter estimates and CI). # # [1] 1000000.00000 -33.16309 -31.73658 -29.99075 # # Time difference of 2623.881 mins # Fitted = 3406, out of a total of 3410 # # ... making HTML file ... # 1 Model 20; nbp= 1; id=IIIIIIIIIIIJIIIIIIIIIIIIIIIII; AIC=-33.1631 # 2 Model 108; nbp= 2; id=IIIIIJIIIIIJIIIIIIIIIIIIIIIII; AIC=-31.7366 # 3 Model 21; nbp= 1; id=IIIIIIIIIIIIJIIIIIIIIIIIIIIII; AIC=-31.5144 # 4 Model 109; nbp= 2; id=IIIIIJIIIIIIJIIIIIIIIIIIIIIII; AIC=-31.4678 # 5 Model 145; nbp= 2; id=IIIIIIIIJIIIJIIIIIIIIIIIIIIII; AIC=-31.4431 ## End(Not run)