get.protein {CHNOSZ} | R Documentation |
Calculate the amino acid compositions of collections of proteins.
get.protein(protein, organism, abundance = NULL, pname = NULL, average = TRUE, digits = 1) yeastgfp(location, exclusive = TRUE)
protein |
character, name of protein or stress response experiment. |
organism |
character, organism (ECO, SGD) or YeastGFP. |
abundance |
numeric, stoichiometry of proteins applied to sums of compositions. |
pname |
character, names of proteins. |
average |
logical, return an average composition of the proteins? |
digits |
numeric, number of decimal places to round the amino acid counts. |
location |
character, name of subcellular location (compartment). |
exclusive |
logical, report only proteins exclusively localized to a compartment? |
When protein
contains one or more Ordered Locus Names (OLN) or Open Reading Frame names (ORF), get.protein
retrieves the amino acid composition of the respective proteins in Escherichia coli or Saccharomyces cerevisiae (for organism
equal to ECO or SGD, respectively). The calculation depends on presence of the objects thermo$ECO
and thermo$SGD
, which contain the amino acid compositions of proteins in these organisms. If protein
is instead a name of one of the stress response experiments contained in thermo$stress
, e.g. low.C or heat.up, the function returns the amino acid compositions of the corresponding proteins.
If the abundances of the proteins are given in abundance
, the individual protein compositions are multiplied by these values then summed into an overall composition; the average is taken if average
is TRUE
, then the amino acid frequencies are rounded to the number of decimal places specified in digits
. The default value of abundance
(1) means the protein compositions are simply summed together. The output of get.protein
can be used as input to add.protein
to add the proteins to the thermo$protein
data frame in preparation for further calculations. Unless names for the new proteins are given in pname
, they are generated using the values in protein
.
(NOTE: get.protein
replaces the proteome
function that was present in CHNOSZ up to version 0.7-2. proteome
had a coding error that led to incorrect calculations of the average composition of proteins when abundance
was not equal to 1.)
The yeastgfp
function returns the identities and abundances of proteins with the requested subcellular localization (specified in location
) using data from the YeastGFP project that is stored in thermo$yeastgfp
. The default value of exclusive
(FALSE
) tells the function to grab all proteins that are localized to a compartment even if they are also localized to other compartments. If exclusive
is TRUE
, only those proteins that are localized exclusively to the requested compartments are identified, unless there are no such proteins, then the non-exclusive localizations are used (applies to the bud localization). The values returns by yeastgfp
can be fed to get.protein
in order to get the amino acid compositions of the proteins.
HTCC1062.faa is a FASTA file of 1354 protein sequences in the organism Pelagibacter ubique HTCC1062 downloaded from the NCBI RefSeq collection on 2009-04-12. The specific search term was Protein: txid335992[Organism:noexp] AND "refseq"[Filter].
For get.protein
, returns the amino acid composition of the specified protein(s) summed together (if single
is TRUE
) or individually (if single
is FALSE
); or, if add
is TRUE
, the index of protein(s) which were added to the thermo$protein
dataframe. yeastgfp
returns a list with elements yORF
and abundance
.
Boer, V. M., de Winde, J. H., Pronk, J. T. and Piper, M. D. W., 2003. The genome-wide transcriptional responses of Saccharomyces cerevisiae grown on glucose in aerobic chemostat cultures limited for carbon, nitrogen, phosphorus, or sulfur. J. Biol. Chem., 278, 3265-3274.
Richmond, C. S., Glasner, J. D., Mau, R., Jin, H. F. and Blattner, F. R., 1999. Genome-wide expression profiling in Escherichia coli K-12. Nucleic Acids Res., 27, 3821-3835.
Tai, S. L., Boer, V. M., Daran-Lapujade, P., Walsh, M. C., de Winde, J. H., Daran, J.-M. and Pronk, J. T., 2005. Two-dimensional transcriptome analysis in chemostat cultures: Combinatorial effects of oxygen availability and macronutrient limitation in Saccharomyces cerevisiae. J. Biol. Chem., 280, 437-447.
## basic examples of get.protein # amino acid composition of two proteins get.protein(c('YML020W','YBR051W'),'SGD') # average composition of proteins get.protein(c('YML020W','YBR051W'),'SGD', abundance=1,pname='PROT1_NEW') # 1 of one and 1/2 of the other get.protein(c('YML020W','YBR051W'),'SGD', abundance=c(1,0.5),average=FALSE,pname='PROT2_NEW') # compositions of proteins induced in carbon limitation get.protein('low.C','SGD') ## overall composition of proteins exclusively localized ## to cytoplasm of S. cerevisiae with reported expression levels y <- yeastgfp('cytoplasm') p <- get.protein(y$yORF,'SGD',y$abundance,'cytoplasm') # add the proteolog and calculate its properties i <- add.protein(p) protein(i) ## Chemical activities of model subcellular proteins # (one-dimensional speciation diagram as a function of logfO2) basis('CHNOS') names <- colnames(thermo$yeastgfp)[6:28] # calculate amino acid compositions using 'get.protein' function for(i in 1:length(names)) { y <- yeastgfp(names[i]) p <- get.protein(y$yORF,'SGD',y$abundance,names[i]) add.protein(p) } species(names,'SGD') res <- 200 t <- affinity(O2=c(-77,-72,res)) mycolor <- topo.colors(6)[1:4] mycolor <- rep(mycolor,times=rep(6,4)) oldpar <- par(bg='black',fg='white') logact <- diagram(t,balance='PBB',names=names,ylim=c(-4,-1.9),legend.x=NULL, color=mycolor,lwd=2,cex.axis=1.5,residue=TRUE)$logact # so far good, but how about labels on the plot? for(i in 1:length(logact)) { imax <- which.max(as.numeric(logact[[i]])) adj <- 0.5 if(imax > 180) adj <- 1 if(imax < 20) adj <- 0 text(seq(-77,-72,length.out=res)[imax],logact[[i]][imax], labels=names[i],adj=adj) } title(main=paste('Subcellular proteologs of S. cerevisiae\n', describe(thermo$basis[-5,])),col.main=par('fg')) par(oldpar) ## Oxygen fugacity - activity of H2O predominance ## diagrams for proteomes in 23 YeastGFP localizations # arranged by decreasing metastability: # order of this list of locations is based on the # (dis)appearance of species on the current set of diagrams names <- c('actin','early.Golgi','ER','vacuolar.membrane', 'cell.periphery','nucleolus','Golgi','lipid.particle', 'punctate.composite','peroxisome','bud','ER.to.Golgi', 'nuclear.periphery','ambiguous','late.Golgi','cytoplasm', 'nucleus','mitochondrion','endosome','vacuole', 'spindle.pole','bud.neck','microtubule') nloc <- c(5,5,5,3,2,3) inames <- 1:length(names) # define the system basis('CHNOS+') # calculate amino acid compositions using 'get.protein' function for(i in 1:length(names)) { y <- yeastgfp(names[i]) p <- get.protein(y$yORF,'SGD',y$abundance,names[i]) add.protein(p) } species(names,'SGD') t <- affinity(H2O=c(-5,0,256),O2=c(-80,-66,256)) # the plot setup layout(matrix(c(1,1,2:7),byrow=TRUE,nrow=4),heights=c(0.7,3,3,3)) # a title par(mar=c(0,0,0,0)) plot.new() text(0.5,0.5,paste('Proteologs for subcellular locations of', 'S. cerevisiae\n',describe(thermo$basis[-c(2,5),])),cex=1.5) opar <- par(mar=c(3,4,1,1),xpd=TRUE) for(i in 1:length(nloc)) { diagram(t,balance='PBB',names=names[inames], ispecies=inames,cex.axis=1.1) label.plot(letters[i]) title(main=paste(length(inames),'locations')) # take out the stable species inames <- inames[-(1:nloc[i])] } layout(matrix(1)) par(opar) ### examples for stress response experiments # coefficient of variation of relative # abundances of proteins induced in heat # response experiments (Richmond et al., 1999) # as a function of fO2 and temperature a <- get.protein("heat","ECO") add.protein(a) basis('CHNOS+') species(a$protein,"ECO") a <- affinity(T=c(0,150),O2=c(-90,-40)) d <- diagram(a,residue=TRUE,do.plot=FALSE,mam=FALSE) draw.diversity(d) title(main="Coefficient of variation of relative abundances of proteins in E. coli observed at 50 degC heat shock", cex.main=0.9) # predominance fields for overall protein # compositions induced by # carbon, sulfur and nitrogen limitation # (experimental data from Boer et al., 2003) expt <- c('low.C','low.N','low.S') for(i in 1:length(expt)) { p <- get.protein(expt[i],"SGD",abundance=1) add.protein(p) } # thermo set-up basis("CHNOS+") basis("O2",-75.29) species(expt,"SGD") a <- affinity(CO2=c(-5,0),H2S=c(-10,0)) diagram(a,balance="PBB",names=expt,color=NULL,residue=TRUE) title(main=paste("Metastabilities of proteins induced by", "carbon, sulfur and nitrogen limitation",sep="\n")) # predominance fields for overall protein # compositions induced and repressed in # an/aerobic carbon-limited experiments # (Tai et al., 2005) # the activities of glucose, ammonium and sulfate # are similar to the non-growth-limiting concentrations # used by Boer et al., 2003 basis(c("glucose","H2O","NH4+","H2","SO4-2","H+"), c(-1,0,-1.3,999,-1.4,-7)) # the names of the experiments in thermo$stress expt <- c("Clim.aerobic.down","Clim.aerobic.up", "Clim.anaerobic.down","Clim.anaerobic.up") # here we use abundance to indicate that the protein # compositions should be summed together in equal amounts for(i in 1:length(expt)) { p <- get.protein(expt[i],"SGD",abundance=1) add.protein(p) } species(expt,"SGD") a <- affinity(C6H12O6=c(-35,-20),H2=c(-20,0)) diagram(a,residue=TRUE,color=NULL,as.residue=TRUE) title(main="Metastabilities of average protein residues in an/aerobic carbon limitation in yeast")