thermo {CHNOSZ} | R Documentation |
This data object holds the thermodynamic database of properties of species, along with operational parameters for CHNOSZ, the properties of elements, references to sources of thermodynamic and compositional data, compositions of chemical activity buffers, amino acid compositions of proteins, and miscellaneous other data taken from the literature. The thermo
object also holds intermediate data used in calculations, in particular the definitions of basis species and species of interest input by the user, and the properties of water
so that subsequent calculations at the same temperature-pressure conditions can be accelerated.
The thermo
object is a list
composed of data.frame
s or lists each representing a class of data. The object is created upon loading the package (by calling data(thermo)
from within the .First.lib
function) from *.csv
files in the data
directory of the package. thermo
is globally accessible; i.e., it is present in the user's environment. After loading CHNOSZ you may run ls()
to verify that thermo
is present, or type thermo
to print the entire contents of the object on the screen. The various elements of the thermo
object can be accessed using R's subsetting operators; for example, typing thermo$opt
at the command line displays the current list of operational parameters (some of which can be altered using functions dedicated to this purpose; see e.g. nuts
).
To make persistent additions or changes to the thermodynamic database of your installation, including compositions of proteins, first locate the installation directory of the package. This will be different depending on your operating system and type of R installation, but is something like /usr/lib/R/library/CHNOSZ, /Volumes/Macintosh HD/Library/Frameworks/R.framework/resources/ library/CHNOSZ, C:\Program Files\R\R-2.10.0\library\CHNOSZ, or C:\Users\[User Name]\Documents\R \win-library\2.10\CHNOSZ on Linux, Mac and Windows (XP and Vista) systems, respectively. To find the exact location of this directory on your system, use the command system.file(package="CHNOSZ")
. Inside the data
directory of the installation directory of the package are the .csv
files that can be edited with a spreadsheet program. Edit and save the OBIGT.csv
and/or protein.csv
files as desired. The next time you start an R session, the new data will be available.
Functions are available to interactively update the thermodynamic database or definitions of buffers (mod.obigt
and mod.buffer
, respectively; a function named change
serves as a wrapper to both of these). Changes made using these functions, as well as any interactive definitions of basis species and species of interest, are lost when the current session is closed without saving or if the thermo
object is reinitialized by the command data(thermo)
.
data(thermo)
thermo$opt
List of operational parameters
Tr | numeric | Reference temperature (K) |
Pr | numeric | Reference pressure (bar) |
Theta | numeric | Theta in the revised HKF equations of state (K) |
Psi | numeric | Psi in the revised HKF equations of state (bar) |
cutoff | numeric | Cutoff below which values are taken to be zero (see makeup ) |
E.units | character | The user's units of energy (cal (default) or J) |
T.units | character | The user's units of temperature (C (default) or K) |
P.units | character | The user's units of pressure (bar (default) or MPa) |
state | character | The default physical state for searching species (aq by default) |
ionize | logical | Should affinity perform ionization calculations for proteins? |
water | character | Computational option for properties of water (SUPCRT (default) or IAPWS) |
online | logical | Allow online searches of protein composition? Default (NA ) is to ask the user. |
level | numeric | Which of any duplicated entries in thermo$obigt to retrieve using info . |
thermo$element
Dataframe containing the thermodynamic properties of elements taken from Cox et al., 1989 and Wagman et al., 1982. The standard molal entropy (S(Z
)) at 25 degrees C and 1 bar for the element of charge (Z
) was calculated from S(H2,g) + 2S(Z
) = 2S(H+), where the standard molal entropies of H2,g and H+ were taken from Cox et al., 1989. The mass of Z
is taken to be zero. Accessing this dataframe using element
will select the first entry found for a given element; i.e., values from Wagman et al., 1982 will only be retrieved if the properties of the element are not found from Cox et al., 1989.
element | character | Symbol of element |
state | character | Stable state of element at 25 degrees C and 1 bar |
source | character | Source of data |
mass | numeric | Mass of element (in natural isotopic distribution; |
referenced to a mass of 12 for 12C) | ||
s | numeric | Entropy of the compound of the element in its stable |
state at 25 degrees C and 1 bar (cal K^-1 mol^-1) | ||
n | numeric | Number of atoms of the element in its stable |
compound at 25 degrees C and 1 bar |
thermo$obigt
This dataframe is a thermodynamic database of standard molal thermodynamic properties and equations of state parameters of species. OBIGT is an acronym for OrganoBioGeoTherm, which refers to a software package produced by Harold C. Helgeson and coworkers at the Laboratory of Theoretical Geochemistry and Biogeochemistry at the University of California, Berkeley. (There may be an additional meaning for the acronym: “One BIG Table” of thermodynamic data.)
As of CHNOSZ version 0.7, the data in OBIGT.csv
represent 179 minerals, 16 gases, and 294 aqueous (largely inorganic) species taken from the data file included in the SUPCRT92 distribution (Johnson et al., 1992), an additional 14 minerals, 6 gases, and 1049 aqueous organic and inorganic species from the slop98.dat file (Shock et al., 1998), and approximately 50 other minerals, 175 crystalline organic and biochemical species, 220 organic gases, 300 organic liquids, 650 aqueous inorganic, organic, and biochemical species, and 40 organic groups taken from the recent literature. Some entries taken from the SUPCRT92 or slop98.dat databases have been superseded, or duplicated, by later data (see examples below for accessing duplicated entries using level
). Each entry is referenced to one or two literature sources listed in thermo$source
, but note the following modifications or additions to the data:
Z
, see above).
These modifications are indicated in OBIGT.csv
by having CHNOSZ as one of the sources of data. Note also that some data appearing in the slop98.dat file were corrected or modified as noted in that file, and are indicated in OBIGT.csv
by having SLOP98 as one of the sources of data.
In order to represent thermodynamic data for minerals with phase transitions, the different phases of these minerals are represented as phase species that have states denoted by cr1, cr2, etc. The standard molar thermodynamic properties at 25 degrees C and 1 bar (Pr and Pr) of the cr2 phase species of minerals were generated by first calculating those of the cr1 phase species at the transition temperature (Ttr) and 1 bar then taking account of the volume and entropy of transition (the latter can be retrieved by combining the former with the Clausius-Clapeyron equation and values of (dP/dT) of transitions taken from the SUPCRT92 data file) to calculate the standard molar entropy of the cr2 phase species at Ttr, and taking account of the enthalpy of transition (DeltaH0, taken from the SUPCRT92 data file) to calculate the standard molar enthalpy of the cr2 phase species at Ttr. The standard molar properties of the cr2 phase species at Ttr and 1 bar calculated in this manner were combined with the equations-of-state parameters of the species to generate values of the standard molar properties at 25 degrees C and 1 bar. This process was repeated as necessary to generate the standard molar properties of phase species represented by cr3 and cr4, referencing at each iteration the previously calculated values of the standard molar properties of the lower-temperature phase species (i.e., cr2 and cr3). A consequence of tabulating the standard molar thermodynamic properties of the phase species is that the values of (dP/dT) and DeltaH0 of phase transitions can be calculated using the equations of state and therefore do not need to be stored in the thermodynamic database. However, the transition temperatures (Ttr) generally can not be assessed by comparing the Gibbs energies of phase species and are tabulated in the database.
Starting with CHNOSZ version 0.9, it is permissible to include duplicated species in thermo$obigt
. Functions that access the database will by default select the first of any duplicated species; this behavior can be altered by changing the value of thermo$opt$level
to e.g. 2 to select the second of any duplicated species (it does not affect the way in which non-duplicated species are accessed).
The identification of species and their standard molal thermodynamic properties at 25 degrees C and 1 bar are located in the first 12 columns of thermo$obigt
:
name | character | Species name |
abbrv | character | Species abbreviation |
formula | character | Species formula |
state | character | Physical state |
source1 | character | Primary source |
source2 | character | Secondary source |
date | character | Date of data entry |
G | numeric | Standard molal Gibbs energy of formation |
from the elements (cal mol^-1) | ||
H | numeric | Standard molal enthalpy of formation |
from the elements (cal mol^-1) | ||
S | numeric | Standard molal entropy (cal mol^-1 K^-1) |
Cp | numeric | Standard molal isobaric heat capacity (cal mol^-1 K^-1) |
V | numeric | Standard molal volume (cm^3 mol^-1) |
The meanings of the remaining columns depend on the physical state of a particular species. If it is aqueous, the values in these columns represent parameters in the revised HKF equations of state (see hkf
), otherwise they denote parameters in a general equation of state for crystalline, gas and liquid species (see cgl
). The names of these columns are compounded from those of the parameters in each of the equations of state (for example, column 13 is named a1.a
). Scaling of the values by orders of magnitude is adopted for some of the parameters, following common usage in the literature.
Columns 13-20 for aqueous species (parameters in the revised HKF equations of state):
a1 | numeric | a1 * 10 (cal mol^-1 bar^-1) |
a2 | numeric | a2 * 10^{-2} (cal mol^-1) |
a3 | numeric | a3 (cal K mol^-1 bar^-1) |
a4 | numeric | a4 * 10^-4 (cal mol^-1 K) |
c1 | numeric | c1 (cal mol^-1 K^-1) |
c2 | numeric | c2 * 10^-4 (cal mol^-1 K) |
omega | numeric | omega * 10^-5 (cal mol^-1) |
Z | numeric | Charge
|
Columns 13-20 for crystalline, gas and liquid species (Cp = a + bT + cT^-2 + dT^-0.5 + eT^2 + fT^lambda).
a | numeric | a (cal K^-1 mol^-1) |
b | numeric | b * 10^3 (cal K^-2 mol^-1) |
c | numeric | c * 10^-5 (cal K mol^-1) |
d | numeric | d (cal K^-0.5 mol^-1) |
e | numeric | e * 10^5 (cal K^-3 mol^-1) |
f | numeric | f (cal K-lambda-1 mol^-1) |
lambda | numeric | lambda (exponent on the f term) |
T | numeric | Temperature of phase transition or upper |
temperature limit of validity of extrapolation (K) |
thermo$source
Dataframe of references to sources of thermodynamic data. Source keys with a leading underscore indicate abbreviations for journals.
source | character | Source key |
reference | character | Reference |
thermo$buffer
Dataframe which contains definitions of buffers of chemical activity. Each named buffer can be composed of one or more species, which may include any species in the thermodynamic database and/or any protein. The calculations provided by buffer
do not take into account phase transitions of minerals, so individual phase species of such minerals must be specified in the buffers.
name | character | Name of buffer |
species | character | Name of species |
state | character | Physical state of species |
logact | numeric | Logarithm of activity (fugacity for gases) |
thermo$protein
Dataframe of amino acid compositions of selected proteins. The majority of the compositions were taken from the SWISS-PROT online database (Boeckmann et al., 2003). N-terminal signal sequences were removed except for some cases where different isoforms of proteins have been identified (for example, MOD5.M and MOD5.N
proteins of YEAST denote the mitochondrial and nuclear isoforms of this protein.)
protein | character | Identification of protein |
organism | character | Identification of organism |
source | character | Source of compositional data |
abbrv | character | Abbreviation or other ID for protein |
chains | numeric | Number of polypeptide chains in the protein |
Ala ...Tyr | numeric | Number of each amino acid in the protein |
thermo$stress
Dataframe listing proteins identified in selected proteomic stress response experiments. The names of proteins begin at row 3, and columns are all the same length (padded as necessary at the bottom by NA
s). Names correspond to ordered locus names (for SGD) or gene names (for ECO). The column names and first two rows give the following information:
colname | character | Name of the experiment |
organism | character | Name of the organism (SGD or ECO) |
source | character | Source of the data |
thermo$groups
This is a dataframe with 22 columns for the amino acid sidechain, backbone and protein backbone groups ([Ala]..[Tyr],[AABB],[UPBB]) whose rows correspond to the elements C, H, N, O, S. It is used to quickly calculate the chemical formulas of proteins that are selected using the iprotein
argument in affinity
.
thermo$basis
Initially NULL
, reserved for a dataframe written by basis
upon definition of the basis species. The number of rows of this dataframe is equal to the number of columns in “...” (one for each element).
... | numeric | One or more columns of stoichiometric |
coefficients of elements in the basis species | ||
ispecies | numeric | Rownumber of basis species in thermo$obigt |
logact | numeric | Logarithm of activity or fugacity of basis species |
state | character | Physical state of basis species |
thermo$species
Initially NULL
, reserved for a dataframe generated by species
to define the species of interest. The number of columns in “...” is equal to the number of basis species (i.e., rows of thermo$basis
).
... | numeric | One or more columns of stoichiometric |
coefficients of basis species in the species of interest | ||
ispecies | numeric | Rownumber of species in thermo$obigt |
logact | numeric | Logarithm of activity or fugacity of species |
state | character | Physical state of species |
name | character | Name of species |
thermo$water
The properties calculated with water
at multiple T, P points (minimum of 26) are stored here so that repeated calculations at the same conditions can be done more quickly.
thermo$Psat
The values of Psat calculated with water.SUPCRT
at multiple T points (minimum of 26) are stored here.
thermo$water2
The properties calculated with water.SUPCRT
at multiple T, P points (minimum of 26) are stored here.
thermo$expt
List of experimental data that are used in the examples.
thermo$expt$PM90
Heat capacities of four unfolded aqueous proteins taken from Privalov and Makhatadze, 1990. Names of proteins are in the first column, temperature in degrees C in the second, and heat capacities in J mol^-1 K^-1 in the third.
thermo$expt$RH95
Heat capacity data for iron taken from Robie and Hemingway, 1995. Temperature in Kelvin is in the first column, heat capacity in J K^-1 mol^-1 in the second.
thermo$expt$RT71
pH titration measurements for unfolded lysozyme (LYSC_CHICK) taken from Roxby and Tanford, 1971. pH is in the first column, net charge in the second.
thermo$expt$SOJSH92
Experimental equilibrium constants for the reaction NaCl(aq) = Na+ + Cl- as a function of temperature and pressure taken from Fig. 1 of Shock et al., 1992. Data were extracted from the figure using g3data (http://www.frantz.fi/software/g3data.php).
thermo$SGD
Dataframe of amino acid composition of proteins from the Saccharomyces Genome Database.
OLN
) has the ordered locus names of proteins, and the remaining twenty columns (Ala
..Val
) contain the numbers of the respective amino acids in each protein; the columns are arranged in alphabetical order based on the three-letter abbreviations for the amino acids. The source of data for SGD.csv is the file protein_properties.tab found on the FTP site of the SGD project on 2008-08-04. Blank entries were replaced with "NA" and column headings were added.
thermo$ECO
AC
} holds the accession numbers of the proteins, the third column (Name
) has the names of the corresponding genes, and the fourth column {OLN
} lists the ordered locus names of the proteins. The remaining twenty columns (A
..Y
) give the numbers of the respective amino acids in each protein and are ordered alphabetically by the one-letter abbreviations of the amino acids. The sources of data for ECO.csv are the files ECOLI.dat ftp://ftp.expasy.org/databases/hamap/complete_proteomes/entries/bacteria and ECOLI.fas ftp://ftp.expasy.org/databases/hamap/complete_proteomes/fasta/bacteria downloaded from the HAMAP (High-quality Automated and Manual Annotation of microbial Proteomes system) FTP site (Gattiker et al., 2003) on 2007-12-20.
thermo$yeastgfp
yORF
, gene name
, GFP tagged?
, GFP visualized?
, and abundance
. The remaining columns correspond to the 23 subcellular localizations considered in the YeastGFP project (Huh et al., 2003 and Ghaemmaghami et al., 2003) and hold values of either T
or F
for each protein. yeastgfp.csv was downloaded on 2007-02-01 from http://yeastgfp.ucsf.edu using the Advanced Search, setting options to download the entire dataset and to include localization table and abundance, sorted by orf number.
Amend, J. P. and Helgeson, H. C., 1997a. Group additivity equations of state for calculating the standard molal thermodynamic properties of aqueous organic species at elevated temperatures and pressures. Geochim. Cosmochim. Acta, 61, 11-46. http://dx.doi.org/10.1016/S0016-7037(96)00306-7
Amend, J. P. and Helgeson, H. C., 1997b. Calculation of the standard molal thermodynamic properties of aqueous biomolecules at elevated temperatures and pressures. Part 1. L-alpha-amino acids. J. Chem. Soc., Faraday Trans., 93, 1927-1941. http://dx.doi.org/10.1039/a608126f
Cox, J. D., Wagman, D. D. and Medvedev, V. A., eds., 1989. CODATA Key Values for Thermodynamics. Hemisphere Publishing Corporation, New York, 271 p. http://www.worldcat.org/oclc/18559968
Dick, J. M., LaRowe, D. E. and Helgeson, H. C., 2006. Temperature, pressure, and electrochemical constraints on protein speciation: Group additivity calculation of the standard molal thermodynamic properties of ionized unfolded proteins. Biogeosciences, 3, 311-336. http://www.biogeosciences.net/3/311/2006/bg-3-311-2006.html
Gattiker, A., Michoud, K., Rivoire, C., Auchincloss, A. H., Coudert, E., Lima, T., Kersey, P., Pagni, M., Sigrist, C. J. A., Lachaize, C., Veuthey, A.-L., Gasteiger, E. and Bairoch, A., 2003. Automatic annotation of microbial proteomes in Swiss-Prot. Comput. Biol. Chem., 27, 49-58. http://dx.doi.org/10.1016/S1476-9271(02)00094-4
Ghaemmaghami, S., Huh, W., Bower, K., Howson, R. W., Belle, A., Dephoure, N., O'Shea, E. K. and Weissman, J. S., 2003. Global analysis of protein expression in yeast. Nature, 425, 737-741. http://dx.doi.org/10.1038/nature02046
HAMAP system. HAMAP FTP directory, ftp://ftp.expasy.org/databases/hamap/, accessed on 2007-12-20.
Huh, W. K., Falvo, J. V., Gerke, L. C., Carroll, A. S., Howson, R. W., Weissman, J. S. and O'Shea, E. K., 2003. Global analysis of protein localization in budding yeast. Nature, 425, 686-691. http://dx.doi.org/10.1038/nature02026
Johnson, J. W., Oelkers, E. H. and Helgeson, H. C., 1992. SUPCRT92: A software package for calculating the standard molal thermodynamic properties of minerals, gases, aqueous species, and reactions from 1 to 5000 bar and 0 to 1000degrees C. Comp. Geosci., 18, 899-947. http://dx.doi.org/10.1016/0098-3004(92)90029-Q
Privalov, P. L. and Makhatadze, G. I., 1990. Heat capacity of proteins. II. Partial molar heat capacity of the unfolded polypeptide chain of proteins: Protein unfolding effects. J. Mol. Biol., 213, 385-391. http://dx.doi.org/10.1016/S0022-2836(05)80198-6
Robie, R. A. and Hemingway, B. S., 1995. Thermodynamic Properties of Minerals and Related Substances at 298.15 K and 1 Bar (10^5 Pascals) Pressure and at Higher Temperatures. U. S. Geol. Surv., Bull. 2131, 461 p. http://www.worldcat.org/oclc/32590140
Roxby, R. and Tanford, C., 1971. Hydrogen ion titration curve of lysozyme in 6 M guanidine hydrochloride. Biochemistry, 10, 3348-3352. http://dx.doi.org/10.1021/bi00794a005
SGD project. Saccharomyces Genome Database, http://www.yeastgenome.org, accessed on 2008-08-04.
Shock, E. L. and Koretsky, C. M., 1995. Metal-organic complexes in geochemical processes: Estimation of standard partial molal thermodynamic properties of aqueous complexes between metal cations and monovalent organic acid ligands at high pressures and temperatures. Geochim. Cosmochim. Acta, 59, 1497-1532. http://dx.doi.org/10.1016/0016-7037(95)00058-8
Shock, E. L., Oelkers, E. H., Johnson, J. W., Sverjensky, D. A. and Helgeson, H. C., 1992. Calculation of the thermodynamic properties of aqueous species at high pressures and temperatures: Effective electrostatic radii, dissociation constants and standard partial molal properties to 1000 degrees C and 5 kbar. J. Chem. Soc. Faraday Trans., 88, 803-826. http://dx.doi.org/10.1039/FT9928800803
Shock, E. L. et al., 1998. slop98.dat (computer data file). http://geopig.asu.edu/supcrt92_data/slop98.dat, accessed on 2005-11-05.
Wagman, D. D., Evans, W. H., Parker, V. B., Schumm, R. H., Halow, I., Bailey, S. M., Churney, K. L. and Nuttall, R. L., 1982. The NBS tables of chemical thermodynamic properties. Selected values for inorganic and C1 and C2 organic substances in SI units. J. Phys. Chem. Ref. Data, 11 (supp. 2), 1-392. http://www.nist.gov/srd/PDFfiles/jpcrdS2Vol11.pdf
YeastGFP project. Yeast GFP Fusion Localization Database, http://yeastgfp.ucsf.edu, accessed on 2007-02-01. Current location: http://yeastgfp.yeastgenome.org
add.protein
and add.obigt
for adding data from local .csv files.
## exploring thermo$obigt # what physical states there are unique(thermo$obigt$state) # formulas of ten random species n <- nrow(thermo$obigt) thermo$obigt$formula[runif(10)*n] ## cross-checking sources # the reference sources ref.source <- thermo$source$source # only take those that aren't journal abbreviations ref.source <- ref.source[-grep('_',ref.source)] # sources of elemental data element.source <- thermo$element$source # primary sources in thermodynamic database obigt.source1 <- thermo$obigt$source1 # secondary sources; some are NA obigt.source2 <- thermo$obigt$source2[!is.na(thermo$obigt$source2)] # sources of protein compositions protein.source <- thermo$protein$source # sources of stress response proteins stress.source <- as.character(thermo$stress[2,]) # if the sources are all accounted for # these all produce character(0) element.source[!(element.source %in% ref.source)] obigt.source1[!(obigt.source1 %in% ref.source)] obigt.source2[!(obigt.source2 %in% ref.source)] protein.source[!(protein.source %in% ref.source)] stress.source[!(stress.source %in% ref.source)] # determine if all the reference sources are cited my.source <- c(element.source,obigt.source1, obigt.source2,protein.source,stress.source) # this should produce character(0) ref.source[!(ref.source %in% my.source)] ## make a table of duplicated species name <- thermo$obigt$name state <- thermo$obigt$state source <- thermo$obigt$source1 species <- paste(name,state) dups <- species[which(duplicated(species))] id <- numeric() for(i in 1:length(dups)) id <- c(id,which(species %in% dups[i])) data.frame(name=name[id],state=state[id],source=source[id]) ## accessing duplicated species # using info() i <- info("Al+3","aq") # length = 2 i1 <- info("Al+3") # length = 1 stopifnot(i1==i[1]) thermo$opt$level <- 2 i2 <- info("Al+3") # length = 1 stopifnot(i2==i[2]) # using subcrt() subcrt("Al+3","aq") # always uses the first species subcrt("Al+3") # pays attention to thermo$opt$level # .. showing the energetic differences between the # duplicated species subcrt(i,c(-1,1)) # using basis() basis(c("Al+3","H+","H2")) thermo$opt$level <- 1 basis(c("Al+3","H+","H2")) # using species() species("Al+3","aq") # listens to thermo$opt$level thermo$opt$level <- 2 species("Al+3") # also listens to thermo$opt$level species(delete=TRUE) species(rev(i)) # can also use the species indices # see also aluminum speciation example in diagram() help page