ccems-package {ccems}R Documentation

Combinatorially Complex Equilibrium Model Selection

Description

This package performs model selections of equilibriums in general and quasi-equilibriums of enzyme complexes in particular. Estimates of dissociation constants K that best describe a dataset are found by systematically scanning though all possibilities of K being infinity and/or plausibly equal to other K. The automatically generated space of models is then fitted to data. Automation enables searches of spaces too large to be specified by hand, e.g. spaces generated by combinatorially complex equilibriums.

Details

Package: ccems
Type: Package
Depends: odesolve,snow
Suggests: nws
License: GPL-2
LazyLoad: yes
LazyData: yes
URL: http://epbi-radivot.cwru.edu/ccems

Index:

RNR                     Ribonucleotide Reductase Data
TK1                     Thymidine Kinase 1 Data
ems                     Equilibrium Model Selection
fitModel                Fit Model
mkGrids                 Make Grid Model Space
mkKd2Kj                 Make Kd2Kj Mappings
mkModel                 Make Specific Model
mkSpurs                 Make Spur Model Space
mkg                     Make Generic Model
simulateData            Simulate Data

This package automatically generates and fits biochemical equilibrium models using as outputs either average protein mass data or enzyme reaction rate data. It is currently limited to systems where one central hub protein mediates all of the interactions and total concentrations of the reactants are approximately known exactly, e.g. as in systems that were reconstituted from purified reactants. It is limited further in that multiple sites for the same ligand must be filled in a predetermined sequence.

Equilibriums can be specified by any acyclic spanning subgraph of its nodes, where edges are dissociation constants. Here, hub protein oligomerization is viewed as a curtain rod from which threads of ligand bound states/complexes hang: each notch down a thread corresponds to one additional ligand bound to the hub j-mer where j increases as one moves to the right on the curtain rod. At the top of each thread is a head-node that sits on the rod. The head nodes must be specified, as some j values may be absent and some ligand sites (other than the thread defining site) may be assumed to be saturated in some j-mers. The last node in each thread will be referred to as a tail node. If a ligand has more than one binding site, the tail of the thread of one site (other than the last one filled) is the head of the thread of the site filled next. Thus, head nodes must be stated only for the first site filled.

In the examples below, E is the concentration of thymidine kinase 1 (TK1) tetramers, S is thymidine, t is dTTP, X is ATP and R is the large subunit of ribonucleotide reductase (RNR). The examples are ordered by cpu consumption: the first takes ~0.5 min on 1 core, the second ~1.5 minutes on 2 cores, and the third ~2 days on 16 cores. The first fits activity data to a single thread model. It is the fastest example because it uses rational polynomials for the system model because [E] is small enough that total [S] approximates free [S]. In the second example there is only one ligand binding site (the s-site) and the hub protein forms at most a dimer. Thus, the thread topology of the acyclic graph used (to explore K equality hypotheses) has only two head nodes and two threads. The head node of the monomer thread is the free hub protein R1t0 and the head node of the dimer thread is the ligand free dimer R2t0. As there is only one site, the s-site, there are only two threads, one for the monomer and one for the dimer. Threads contain the names of only their non-head nodes since their heads have already been specified. This structure is assigned to topology which is then passed to the function mkg to produce a generic model object g. Together with the data, this generic model object is then passed to the function ems (equilibrium model selection) which generates the model space, fits it to the data, and returns the topN (typically 5, 10 or 20) best (lowest AIC) models. The third example is more complicated than the second because ATP has multiple R1 binding sites and because R also tetramerizes and hexamerizes with increases in [ATP]. This problem motivated the development of this R package. It is an example of a problem whose solution is enabled by this software because its model space is too large to specify by hand. A linux cluster is needed to execute this example.

The user must have working directory write privileges so that the subdirectories models and results can be created to hold model C code (generated by mkg) and html output (generated by ems), respectively.

Note

This work was supported by the National Cancer Institute (K25CA104791).

Author(s)

Tom Radivoyevitch (txr24@case.edu)

References

Radivoyevitch, T. (2008) Equilibrium model selection: dTTP induced R1 dimerization. BMC Systems Biology 2, 15.

Radivoyevitch, T. Automated model generation and analysis methods for combinatorially complex biochemical equilibriums. (submitted).

See Also

ems, mkg

Examples

## LAPTOP EXAMPLE: Top 3 three parameter models of 
##                 Berenstein et al. JBC 2000 TK1 data
library(ccems)
topology <- list(  
    heads=c("E1S0"), #one E is a tetramer
    sites=list(                    
        c=list(    # c-site = catyltic site  
            t=c("E1S1","E1S2","E1S3","E1S4")   
        )
    )
)
g <- mkg(topology,activity=TRUE,TCC=FALSE)
dd=subset(TK1,(year==2000),select=c(E,S,v)) # Berenstein et al
names(dd)[1:2]= c("ET","ST")
tops=ems(dd,g,maxTotalPs=3,kIC=30000) 
plot(dd$ST,dd$v,type="p",pch=1, xlab="[dT] (uM)", ylab="v",
          main="Top 3 TK1 Models with 3 parameters or less")
lgx=log(dd$ST)
upr=range(lgx)[2]
lwr=range(lgx)[1]
del=(upr-lwr)/50
fineX=exp(seq(lwr,upr,by=del))
newPnts <- data.frame(ET = rep(dd$ET[1],length(fineX)), ST = fineX)
for (i in 1:3) {
  df <- simulateData(tops[[i]],predict=newPnts,typeYP="v")$predict  
  lines(df$ST,df$EY,type="l",lty=i) 
}

## DESKTOP EXAMPLE: This example automatically creates (and fits) the model  
## space of the BMC SB 2008 dTTP induced R1 dimerization reference above.
library(ccems)
topology <- list(  
    heads=c("R1t0","R2t0"),  
    sites=list(       
        s=list(                     # s-site    thread #
            m=c("R1t1"),        # monomer      1
            d=c("R2t1","R2t2")  # dimer        2
        )
    )
) 

g <- mkg(topology,TCC=TRUE) 
data(RNR)
d1 <- subset(RNR,(year==2001)&(fg==1)&(G==0)&(t>0),select=c(R,t,m,year))
d2 <- subset(RNR,year==2006,select=c(R,t,m,year)) 
dd <- rbind(d1,d2)
names(dd)[1:2] <- paste(strsplit(g$id,split="")[[1]],"T",sep="")#e.g. to form "RT"
rownames(dd) <- 1:dim(dd)[1] # lose big number row names of parent dataframe
## top10=ems(dd,g,cpusPerHost=c("localhost"=2),maxTotalPs=2,ptype="SOCK") 

## CLUSTER EXAMPLE: This ATP induced R1 hexamerization example runs 1.8 days
##                  on a 16 core (4 quad proc machines) ROCKS Linux cluster. 

library(ccems)
topology <- list(
    heads=c("R1X0","R2X2","R4X4","R6X6"), 
    sites=list(                # s-sites are already filled only in (j>1)-mers 
        a=list(  #a-site                                                    thread
            m=c("R1X1"),                                            # monomer   1
            d=c("R2X3","R2X4"),                                     # dimer     2
            t=c("R4X5","R4X6","R4X7","R4X8"),                       # tetramer  3
            h=c("R6X7","R6X8","R6X9","R6X10", "R6X11", "R6X12")     # hexamer   4
        ), # tails of a-site threads are heads of h-site threads
        h=list(   # h-site
            m=c("R1X2"),                                            # monomer   5
            d=c("R2X5", "R2X6"),                                    # dimer     6
            t=c("R4X9", "R4X10","R4X11", "R4X12"),                  # tetramer  7
            h=c("R6X13", "R6X14", "R6X15","R6X16", "R6X17", "R6X18")# hexamer   8
        )
    )
)
g=mkg(topology,TCC=TRUE) 
dd=subset(RNR,(year==2002)&(fg==1)&(X>0),select=c(R,X,m,year))
names(dd)[1:2] <- paste(strsplit(g$id,split="")[[1]],"T",sep="")#i.e. c("RT","XT")

## 29 choose 3(2) is 3654(406), so 3654 + 406 + 29 + 1 = 4090 spurs, but after 
## subtracting those without at least one hexamer complex, and after adding 
## grids, the total number of models is 3410. Of these 3406 converged, see below. 
## Not run: 
cpusPerHost=c("localhost" = 4,"compute-0-0"=4,"compute-0-1"=4,"compute-0-2"=4)
top10=ems(dd,g,cpusPerHost=cpusPerHost, maxTotalPs=3, ptype="SOCK",IC=100) 

# The following are the last few lines of the output. The first line shows that a 
# one parameter model is best(shown are best AICs of models with 0, 1, 2 or 3  
# parameters). The next shows that it took 1.8 days on 16 cpus to fit 3406 models. 
# And the block that follows shows that the top 5 models are all spur graph models.
# The html file RXglobSOCK.htm in the results directory contains this information 
# and more (e.g. parameter estimates and CI). 
#
# [1] 1000000.00000     -33.16309     -31.73658     -29.99075
#
# Time difference of 2623.881 mins
# Fitted = 3406, out of a total of  3410 
#
# ... making HTML file ... 
#  1 Model  20; nbp= 1; id=IIIIIIIIIIIJIIIIIIIIIIIIIIIII; AIC=-33.1631
#  2 Model 108; nbp= 2; id=IIIIIJIIIIIJIIIIIIIIIIIIIIIII; AIC=-31.7366
#  3 Model  21; nbp= 1; id=IIIIIIIIIIIIJIIIIIIIIIIIIIIII; AIC=-31.5144
#  4 Model 109; nbp= 2; id=IIIIIJIIIIIIJIIIIIIIIIIIIIIII; AIC=-31.4678
#  5 Model 145; nbp= 2; id=IIIIIIIIJIIIJIIIIIIIIIIIIIIII; AIC=-31.4431
## End(Not run)

[Package ccems version 1.02 Index]