sgd {intervals} | R Documentation |
This data set contains a data frame describing a subset of
the chromosome feature data represented in Fall 2007 version of
saccharomyces_cerevisiae.gff
, available for download from the
Saccharomyces Genome Database (http://www.yeastgenome.org).
data(sgd)
A data frame with 14080 observations on the following 8 variables.
SGDID
type
CDS
,
five_prime_UTR_intron
, intron
, and ORF
. Note
that ORF correspond to a whole gene while CDS
, to an
exon. S. cerevisae does not, however, have many
multi-exonic genes.
feature_name
parent_feature_name
feature_name
of the a larger element to which the
current feature belongs. All retained CDS
entries, for
example, belong to an ORF
entry.
chr
start
stop
strand
# An example to compute "promoters", defined to be the 500 bases # upstream from an ORF annotation, provided these bases don't intersect # another orf. See documentation for the sgd data set for more details # on the annotation set. use_chr <- "chr01" data( sgd ) sgd <- subset( sgd, chr == use_chr ) orf <- Intervals( subset( sgd, type == "ORF", c( "start", "stop" ) ), type = "Z" ) rownames( orf ) <- subset( sgd, type == "ORF" )$feature_name W <- subset( sgd, type == "ORF", "strand" ) == "W" promoters_W <- Intervals( cbind( orf[W,1] - 500, orf[W,1] - 1 ), type = "Z" ) promoters_W <- interval_intersection( promoters_W, interval_complement( orf ) ) # Many Watson-strand genes have another ORF upstream at a distance of # less than 500 bp hist( size( promoters_W ) ) # All CDS entries are completely within their corresponding ORF entry. cds_W <- Intervals( subset( sgd, type == "CDS" & strand == "W", c( "start", "stop" ) ), type = "Z" ) rownames( cds_W ) <- NULL interval_intersection( cds_W, interval_complement( orf[W,] ) )