updateOrgAndIdType {SubpathwayMiner} | R Documentation |
Update the organism and the type of gene identifier.
updateOrgAndIdType(org="hsa",idType="ncbi-geneid", path="ftp://ftp.genome.jp/pub/kegg/genes/organisms",verbose=TRUE)
org |
A character string. The abbreviation of a genome name. |
idType |
A character string. The type of gene identifier. |
path |
A character string. |
verbose |
A logical. If TRUE, the additional diagnostics are printed. |
The existing tools mainly use DBMS (data base management system) to store all data relative to analysis of pathways and the update process of the data is transparent to users, which means that the annotation results users get from these tools may become outdated. We don't use DBMS to store data. We present a new method that enables users to update data by themselves. Users are firstly required to set organism and type of gene identifier before annotateing genes to the pathways. According to the setting, the system can download all data relative to analysis of pathways in the certain organism, and then treat and store them in an environment variable in R. Through the method the system can synchronize the data with the KEGG databases and support almost all organisms and cross reference identifiers in the KEGG GENE database.
The function is able to update the variable gene2ec
, ec2gene
, gene2path
and path2gene
and background
in the environment variable.
Note that if the user don't run the function updateOrgAndIdType
,loadKe2g
after starting up R-system and loading the package of the system, then the defalut value of the argument org
and idType
is "hsa" (human) and "ncbi-geneid" (Entrez gene identifiers). The user can get the information from the return value of the funciton getOrgAndIdType
.
The argument org
must be the abbreviation of a genome name. For example, the hsa, mms, eco, sce and dme is the abbreviation of human, mouse, E.coli, yeast and fluit fly. Detailed information is provided in http://www.genome.jp/kegg/catalog/org_list.html.
The argument idType
ia a character string of the type of identifier. The system supports most KEGG cross-reference identifiers such as Entrez gene IDs (idType="ncbi-geneid"), NCBI gi numbers (idType="ncbi-gi"), UniProt accession numbers (idType="uniprot"), etc. Detailed information is provided in ftp://ftp.genome.jp/pub/kegg/genes/organisms. For example, because a file name in "hsa" file directory is "hsa ensembl-hsa.list", idType="ensembl-hsa" is available as the input identifier type. Note that the idType
is relative to the genome. Different genomes may support different idType
. For example, "sgd-sce" is support by yeast genome. however, it is not supported by human genome.
The argument path
is the path of file directory of the organism cross-reference identifiers. The default value is "ftp://ftp.genome.jp/pub/kegg/genes/organisms". The setting ensure that the user is able to obain the updated data from the KEGG FTP site. Of course, the user can also download the organisms data of interest from FTP site and change path to the data file for implementing the local update.
Note that the programming is time consumming. For Sovling the problem, see saveKe2g
and loadKe2g
.
Chunquan Li <lcqbio@yahoo.com.cn>
getAnn
, updateGraphs
,getOrgAndIdType
##update organism ang the type of gene identifiers #getOrgAndIdType() #updateOrgAndIdType("sce","sgd-sce") #getOrgAndIdType()