get.cid {rpubchem}R Documentation

Get PubChem Compound Information

Description

The PubChem compound collection stores a variety of information for each molecule. These include canonical SMILES, molecular properties, substance associations, synonyms etc.

This function will extract a subset of the molecular property information for one or more compound ID's

Usage

get.cid(cid, quiet=TRUE, from.file=FALSE)

Arguments

cid A vector of one or more compound ID's
quiet If FALSE, output is verbose
from.file If TRUE then the first argument is considered to be the name of a file containing the XML data. If FALSE the first argument must be a sequence of compound ID's and the data will be downloaded from the PubChem FTP site

Details

Processing a large number of compound ID's can take a long time. For large numbers of CID's the resultant XML file can be many megabytes. This may take a long time to download. After download it takes approximate 20 sec to process a 23MB data file.

It should also be noted that the data files are downloaded using the R interface to Curl. In addition, the PubChem servers do not allow very large query URL's. This limits the number of compound ID's that can be directly pulled of the PubChem servers to about 1000

Value

A data.frame with 9 columns:

CID The compound ID
IUPACName The IUPAC name of the compound
CanonicalSmiles The canonical SMILES for the compound
MolecularWeight Molecular weight
TotalFormalCharge The formal charge
MolecularFormula The molecular formula
TPSA Topological polar surface area
HeavyAtomCount Heavy atom count
FormalCharge Total formal charge
HydrogenBondDonor Hydrogen bond donor count
HydrogenBondAcceptor Hydrogen bond acceptor count

Author(s)

Rajarshi Guha rguha@indiana.edu

See Also

get.assay, get.sid, get.sid.list


[Package rpubchem version 1.4.2 Index]