yourprep {YourCast} | R Documentation |
Builds the data object for yourcast
function from files in working directory or other specified
directory and checks for errors
yourprep(dpath=getwd(), tag="csid", index.code="ggggaa", datalist=NULL, G.names=NULL, A.names=NULL, T.names=NULL, adjacency=NULL, year.var=FALSE, sample.frame=NULL, summary=FALSE, verbose=FALSE)
dpath |
String. Name of the directory where data files are
stored. If NULL then defaults to working directory.
Default: NULL |
tag |
String. Group of characters placed before CSID code in
filenames to indicate which files in dpath function
should load. The tag can also be used to differentiate
between different groups to be considered in separate
analysis; for example, ‘m’ for male deaths and
‘f’ for female deaths. Default: "csid" |
index.code |
String indicating how the CSID index variable is
coded in the input data. Between 0 and 4 of the following two
characters are used in this order: g for the geographic
index (such
as country) and a for a grouped continuous variable
like
an age group. For example, ggggaa would have the
function interpret
‘245045’ by using ‘2450’ as the country code and
‘45’ as the age group. Default: "ggggaa" |
datalist |
A list of cross section dataframes already loaded into
the workspace to be added to the dataobj . Names of
list elements should be the numerical CSID code for each
cross section, and dataframes should be formated identically
to files loaded from an external directory (see Details) |
A.names, G.names, T.names |
String. Filename of optional
two-column data files that list all valid numerical codes
(in the first column) and corresponding alphanumeric names
(optionally in the second column) for the indices
corresponding to geographic areas in G.names , age
groups in A.names , and time periods in
T.names . Function will search dpath for file
with specified name; please include column labels. The
optional alphanumeric identifiers are most commonly only
used for geographic areas since numerical values for age
groups and time periods are usually meaingful on their
own. However, if other grouped continuous variable used in
place of ages, for example, specifying these labels will be
important for output to be meaningful. NOTE: Auxiliary files
will loaded automatically by yourprep() if they are
saved in the dpath and labeled with the tag
specified by
the user. See ‘Details’ section for more
infromation. Default: NULL |
adjacency |
Data file with codes to construct the symmetric
matrix (geographic region by geographic region) of proximity
scores for geographic smoothing used by the ‘map’ and ‘bayes’
methods. The larger the relative score, the more proximate
that pair of countries is in the prior; a zero element means
the two geographic areas are unrelated (the diagonal is
ignored). Each row of the proximity file has three
columns, consisting of geographic codes for two countries
and a score indicating the proximity or similarity of the
two geographic regions; please include column labels. For
convenience, geographic regions that are unrelated (and
would have zero entries in the symmetric matrix) may be
omitted from proximity . In addition, proximity
may include rows corresponding to geographic regions not
included in the present analysis. Default: NULL |
year.var |
Boolean. Should be TRUE if year coded as
separate variable rather than as rowname for cross section
data files. Function will look for year variable to
use as rownames and then drop it from the dataframe. Change
will only be made to dataframe if it does not already have
rownames or if exisiting rownames are merely a
‘1...N’ index of row numbers, so it is possible to
apply correction even if some cross sections do not have a
year variable and already have the correct
rownames. Default: FALSE |
sample.frame |
Optional four element vector containing, in order,
the start and end time periods to be used for the observed
data and the start and end time periods to be forecast. All
cross sections do not have to begin at starting date, but
must contain all years after the first observed
value. Variables to be forecasted should be coded as
NA in the out-of-sample period. Note that this makes
it easy to reserve a range of values of the dependent
variable for out-of-sample forecasting evaluation; our
summary and plot functions in
yourcast will make these comparisons
automatically if the out-of-sample data are
included. yourprep() uses this information only to
verify that cross sections are correctly
constructed. Default: NULL |
summary |
Boolean. If TRUE , means for available
observations on each variable are displayed for the cross
sections read by yourprep() . Default: FALSE |
verbose |
Boolean. If TRUE , function prints name of each
cross section or auxiliary file as it is read into the
dataobj . Default: FALSE |
Creates dataobj
input for yourcast
from
files in working directory or other specified directory. Checks
that all cross sections in data
list titled properly and
if all years up to last predicted year included in the dataframes
(if sample.frame
argument specified). Please note, however,
that all cross sections from the same geographic area must have the
same observation and prediction years in the dataframe (even if
NA
) for the graphing software plot.yourcast
to
work.
The cross section files must be named according to the CSID
identifiers for country code and age group, preceeded by the
specified tag (default: "csid"
) so that yourprep()
can
identify the file from other files in the dpath. For example, for
the USA (country code 2450) time series of 45 year old
individuals, the file name should be ‘csid245045.txt’ if the
tag is left as the default. Files must have an extension so that
the program can recognize how the data is coded. Currently, fixed
width text files (‘*.txt’), comma-separated values
(‘*.csv’), and Stata v.5-10 (‘*.dta’) files are
supported, and multiple file types may be used in the same run of
the program. ‘*.Rdata’ objects can be included with the
datalist
option after they are loaded to a list in the
workspace. yourprep()
includes diagnostics to ensure that
objects are properly named and not included accidentally, but
users should examine the specified dpath
before running
yourprep()
to minimize errors.
Each cross section file should be labeled columns of time-series
data for the dependent variable(s) (e.g., disease, pop) and the
covariates that will be used in the forecast. The rownames for
the dataframe should be the observation year (if the year is
coded as a separate variable, set year.var=TRUE
). The
files must contain the full time series that will be specified in
the sample.frame
argument in yourcast
after
the first observed year. For instance, if
sample.frame=c(1950,2000,2001,2030)
, then files would have
observations that start between 1950 and 2000 and include all
other years (even if the entries are NA
) up to the last
year of prediction, i.e., 2030.
Optional auxiliary files such as G.names
should be named
according to the filename specified in the respective
arguments. If specified, these files must have extensions and be
coded in one of the three supported file types. However, these
files will be automatically loaded by yourprep()
if they are
saved in the dpath
and labeled with the tag specified by the
user. The default names for these files must be used (e.g.,
‘G.names’ and ‘adjacency’). For example,
if the tag
is left as the default and there is a file in the
dpath
labeled ‘csid.G.names.txt’, yourprep()
will load this
automatically and save the input as the G.names
element of
the ‘dataobj’ list. yourprep()
arguments such as
G.names
take precedence over ‘TAG.*’ files in thedpath
.
dataobj |
A list with several components:
|
Jon Bischof jbischof@fas.harvard.edu
http://gking.harvard.edu/yourcast
yourcast
function and documentation
(help(yourcast)
)
# Working directory automatically set to directory with cross # section and auxiliary files to begin. Files for this example # in 'data' folder of YourCast library. #Old working directory to be restored later oldwd <- getwd() # Now setting wd to 'data' folder in YourCast library setwd(paste(.libPaths()[1],"/YourCast/data",sep="")) # Simple run of the function, using option that turns year variable # into label in each cs. Use sample.frame argument for all diagnostics # to work dta <- yourprep(G.names="cntry.codes.txt",adjacency="adjacency.txt", year.var=TRUE,verbose=TRUE,sample.frame=c(1950,2000,2001,2030)) # With summary output (means of variables in each cross section) ## Not run: dta <- yourprep(G.names="cntry.codes.txt",adjacency="adjacency.txt", year.var=TRUE,summary=TRUE) ## End(Not run) # Function can also add datafiles already loaded into R as objects in # the workspace with "datalist" option if put into a list and properly # labeled. All diagnostics still performed # 'csid204545', etc., are dataframes in workspace # Labels changed to nonsense ones so as not to confuse with other files data(csid204545) data(csid204550) data(csid204555) datalist <- list("123456"=csid204545,"234567"=csid204550, "345678"=csid204555) # Verbose option turned on and datalist argument added dta <- yourprep(G.names="cntry.codes.txt",adjacency="adjacency.txt", year.var=TRUE,verbose=TRUE,datalist=datalist) # Setting working directory back setwd(oldwd) rm(oldwd)