sanfrancisco.home.sales {nutshell}R Documentation

San Franciscio Home Sales Data

Description

This data contains information on homes sold in San Francisco between 2/13/2008 and 7/14/2009.

Usage

data(sanfrancisco.home.sales)

Format

A data frame with 3281 observations on the following 15 variables.

line
a numeric vector representing the line number of the observation in the data set
county
a factor with levels San Francisco County
street
a factor representing the street address of the property
city
a factor with levels San Francisco
zip
a numeric vector representing the zip code of the property
date
a Date representing the sale date
price
a numeric vector representing the sales price
bedrooms
a numeric vector representing the number of bedrooms
squarefeet
a numeric vector representing the interior are of the property, in square feet
lotsize
a numeric vector representing the lot size of the property, in square feet
year
a numeric vector representing the year in which the property was built
latitude
a numeric vector representing the lattitude coordinate of the property
longitude
a numeric vector representing the longitude coordinate of the property
month
a factor representing the month in which the property was sold
neighborhood
a factor representing neighborhood names

Details

This data set was assembled from a variety of sources, including two Bay area newspapers (the San Jose Mercury News and the San Francisco Chronicle), Yahoo Maps, and Zillow Neighborhood Boundaries.

This data set is used as an example in the book "R in a Nutshell" from O'Reilly Media. In the book, we took separate samples for training and testing. Indices for observations in each sample are included in sanfrancisco.home.sales.testing.indices and sanfrancisco.home.sales.training.indices.

Source

Data was assembled from a variety of sources including http://www.sfgate.com http://www.mercurynews.com http://www.zillow.com/howto/api/neighborhood-boundaries.htm

Examples

data(sanfrancisco.home.sales)
library(lattice)
trellis.par.set(fontsize=list(text=7))
dollars.per.squarefoot <- mean(
  sanfrancisco.home.sales$price / sanfrancisco.home.sales$squarefeet,
  na.rm=TRUE);
xyplot(price~squarefeet|neighborhood,
        data=sanfrancisco.home.sales,
        pch=19, 
        cex=.2,
        subset=(zip!=94100 & zip!=94104 & zip!=94108 & 
                zip!=94111 & zip!=94133 & zip!=94158 &
                price<4000000 &
                ifelse(is.na(squarefeet),FALSE,squarefeet<6000)),
        strip=strip.custom(strip.levels=TRUE, 
           horizontal=TRUE,
           par.strip.text=list(cex=.8)),
        panel=function(...) {
           panel.abline(a=0,b=dollars.per.squarefoot);
           panel.xyplot(...);
        }
)

[Package nutshell version 1.0 Index]