sgd {intervals}R Documentation

Yeast gene model sample data

Description

This data set contains a data frame describing a subset of the chromosome feature data represented in Fall 2007 version of saccharomyces_cerevisiae.gff, available for download from the Saccharomyces Genome Database (http://www.yeastgenome.org).

Usage

data(sgd)

Format

A data frame with 14080 observations on the following 8 variables.

SGDID
SGD feature ID.
type
Only four feature types have been retatined: CDS, five_prime_UTR_intron, intron, and ORF. Note that ORF correspond to a whole gene while CDS, to an exon. S. cerevisae does not, however, have many multi-exonic genes.
feature_name
A character vector
parent_feature_name
The feature_name of the a larger element to which the current feature belongs. All retained CDS entries, for example, belong to an ORF entry.
chr
The chromosome on which the feature occurs.
start
Feature start base.
stop
Feature stop base.
strand
Is the feature on the Watson or Crick strand?

Examples


# An example to compute "promoters", defined to be the 500 bases
# upstream from an ORF annotation, provided these bases don't intersect
# another orf. See documentation for the sgd data set for more details
# on the annotation set.

use_chr <- "chr01"

data( sgd )
sgd <- subset( sgd, chr == use_chr )

orf <- Intervals(
                 subset( sgd, type == "ORF", c( "start", "stop" ) ),
                 type = "Z"
                 )
rownames( orf ) <- subset( sgd, type == "ORF" )$feature_name

W <- subset( sgd, type == "ORF", "strand" ) == "W"

promoters_W <- Intervals(
                         cbind( orf[W,1] - 500, orf[W,1] - 1 ),
                         type = "Z"
                         )

promoters_W <- interval_intersection(
                                     promoters_W,
                                     interval_complement( orf )
                                     )

# Many Watson-strand genes have another ORF upstream at a distance of
# less than 500 bp

hist( size( promoters_W ) )

# All CDS entries are completely within their corresponding ORF entry.

cds_W <- Intervals(
                 subset( sgd, type == "CDS" & strand == "W", c( "start", "stop" ) ),
                 type = "Z"
                 )
rownames( cds_W ) <- NULL

interval_intersection( cds_W, interval_complement( orf[W,] ) )


[Package intervals version 0.10.3 Index]