yourcast {YourCast} | R Documentation |
Runs a set of regression models to forecast time-series cross-sectional data by either considering independent regressions in each cross-sectional unit or by using a variety of techniques to smooth across units.
yourcast(formula=NULL, dataobj=NULL,sample.frame=c(1950,2000,2001,2030), standardize=TRUE, elim.collinear=FALSE, tol=0.9999, solve.tol = 1.e-10,svdtol=10^(-10), userfile=NULL, savetmp = T, model.frame=FALSE, debug = F, rerun= "yourcast.savetmp", ### specific to models model="OLS",zero.mean=FALSE, #### smooth over ages Ha.sigma = 0.3, Ha.sigma.sd= 0.1, Ha.deriv=c(0,0,1), Ha.age.weight=0, Ha.time.weight=0, #### smooth over time Ht.sigma= 0.3, Ht.sigma.sd=0.1, Ht.deriv=c(0,0,1), Ht.age.weight=0, Ht.time.weight=0, #### smooth over age-time Hat.sigma=0.2, Hat.sigma.sd=0.1,Hat.a.deriv=c(0,1),Hat.t.deriv=c(0,1), Hat.age.weight=0,Hat.time.weight=0, #### smooth over cntry-time Hct.sigma=0.3, Hct.sigma.sd =0.1, Hct.t.deriv=1, Hct.time.weight = 0, LI.sigma.mean=0.2,LI.sigma.sd = 0.1, nsample= 500, low.pow=T, verbose=TRUE)
formula |
A standard R formula of the form y sim x_1 +
x_2, except that an explanatory variable is included for a
particular cross-section only if it is both listed in the formula
and available in that cross-section's data set (see
dataobj ). Explanatory variables in the formula but not available for
a cross-section (or in a cross-sectional dataset but not in the
formula) are excluded. (For mortality forecasting, the specification
looks like log(deaths/population) sim x_1 + x_2, with deaths
and population stored as separate variables in each dataframe.) (May
be set to NULL if savetmp was set to TRUE on
the last run, in which case the value of formula will come from the
saved file.) |
dataobj |
A object of class ‘yourcast’ or equivalent. See
help(yourprep) for more details.
The dataobj may be supplied in one of four ways. Most
commonly, the argument will specify (1) an
object (in working memory) or (2) a string with the name of a file
in the working directory. However, if (3) dataobj is a string
referring to a directory on disk, then
each element of the list above should be stored in a file in that
directory, with element ‘data’ consisting of a subdirectory
containing separate ASCII data files. (If this option is chosen, a
complete data object, called ‘dataobj.Rdata’, will be stored in the
directory named, and it will be loaded automatically if
yourcast is run again with this chosen option.)
(4) The last option is for dataobj to be set to NULL , after
which the function will look for a ‘yourcast.savetmp’ file in the
working directory from a previous run of the function where the
argument savetmp was set to TRUE .
The function yourprep is available to help construct
the dataobj in the proper format from individual cross section
files in the working directory or the workspace. This function also
performs a number of diagnostics to
ensure that the data is entered properly and can be read by
yourcast . See help(yourprep) for more information |
sample.frame |
Vector. A four element vector containing, in order, the start
and end time periods to be used for the observed data and the start
and end time periods to be forecast. Years identified here that are
not available for a cross-section are ignored. Default:
c(1950,2000,2001,2030) . |
standardize |
Boolean. Should the covariates in each
cross-sectional unit be standardized (to zero mean and standard
deviation of 1)? Standardization is performed for both the in-
and out-of-sample periods. Default: TRUE . |
elim.collinear |
Boolean. Whether collinearity among covariates
should be tested and those that are collinear shoul be eliminated.
Default: FALSE . |
tol |
Double scalar. Tolerance to find collinearities among
covariates. Default: 0.9999 . |
solve.tol |
A real number smaller than one that is used in the
argument of the R-function solve to invert matrices (see
description for tol ). Default: 1^{-10}. |
svdtol |
A scalar; the tolerance used in inverting a matrix by SVD. Default: 10^{-10}. |
userfile |
A string with the name of a file that contains your
values for some or all of yourcast 's arguments. This file
contains R code that changes default values of arguments. E.g.,
the file might contain:
index.code <- 30 data <- "WHOmortalityData"If an option is specified in userfile , it takes precidence over
command line options, so it is normally best to specify each option in
either the userfile or the command line but not both. Default: NULL |
savetmp |
If TRUE , yourcast saves a file in the default directory
(called ‘yourcast.savetmp’) with preliminary calculations. If the value
of formula or dataobj is missing when yourcast is
called, yourcast
will get their values from this file, if it exists. This saves a
minute or so of computing time for large data sets and is useful for
multiple runs on the same data with different formulas specified or
different prior values. If FALSE , no file is saved. (The structure of
‘yourcast.savetmp’ is for the convenience of yourcast and is not
intended to be read by the user or saved for more than one run.)
Default: TRUE . |
model.frame |
If TRUE , include entire input dataobj in the output
object. Default: FALSE . |
debug |
Boolean. It puts the
environment that contains parameters and arguments of the
simulation in the user workspace. Default FALSE . |
rerun |
String. The name of the file that is saved in the default
directory with preliminary calculations; see
savetmp . Default: yourcast.savetmp |
model |
A string indicating the forecasting method, including:
Bayes maximum a posteriori (map ), Bayes with Gibbs sampling
(bayes ), Ordinary Least Squares (ols ), Poisson
(poisson ), and Lee-Carter (LC ). Default: ols .
(We usually recommend map .)
yourcast also includes a procedure to help users set the sigma
parameters below automatically for the case of model=map , and
smoothing over age, time, or age and time, but for only one
country. You may do this by running a preprocessing instance of
yourcast first by setting this parameter to ebayes and using
either the data to be analyzed or a larger data set which is likely
to have similar or related parameter values. When ebayes is chosen,
the yourcast output object will contain only the parameter values to
feed into the next run of yourcast . |
zero.mean |
A boolean or named vector with a value of barμ
for each age group. If TRUE , the prior has zero mean. If FALSE , the
prior has nonzero mean centered around the observed mean age profile
(i.e., the average of Y over time and levels of the geographic
index for each age group). Default: FALSE . |
Ha.sigma |
This can be set in one of three ways: (1) a scalar
which sets σ_a, the prior standard deviation of E(Y),
indicating how much to smooth E(Y) over age groups (which may
vary over geographic areas and time periods, and with the standard
deviations averaged over age groups). A larger standard deviation
represents more prior uncertainty, which allows the data to play a
greater role. (2) NA to not smooth in this way. (3) To have yourcast
search for a good value based on a target value of the derivative of
E(Y) with respect to age, set to a vector of elements containing
the start and end of a range in sigma in which to look (such as 0.05
and 1.5), the number of values to look at within this range (such as
5), and the target value of the derivative of E(Y) with respect
to age (such as 0.05). The vector may also include a fifth element,
which is the target value of the total standard deviation of E(Y)
over all dimensions of the prior (such as 0.1). (You may choose to
run yourcast with model=ebayes on a related data set to find an
approximate target value of the derivative and standard deviation
automatically.) Default: 0.30 . |
Ha.sigma.sd |
A scalar; the standard deviation of parameter
Ha.sigma (for Gibbs sampling only). Default: 0.1 . |
Ha.deriv |
A numeric vector, each element of which is n,the
degree of a (discrete) derivative of the
smoothness functional with respect to the age group. Element k of
this vector refers to the (k-1)th derivative, where 0 excludes
the derviative, 1 includes it, and values in between include the
derivative but weight it down proportionally. The first element of
the vector corresponds to the weight on the derivative with respect
to age of order 0 (the identity operator), the second to the weight
on the derivative of order 1 (the 1st derivative), etc. For example,
c(0, 1, 1) corresponds to a mixed functional that penalizes the
first and second derivatives equally. The higher the order of
derivative, the more local smoothness over age groups; and lowest
specified derivative controls the form of prior
indifference. Default: c(0, 0, 1) , which usually works well. |
Ha.age.weight |
A scalar or a numeric vector with weights that
determine how much smoothing occurs for different age groups. If set
to 0 or NA, age groups are weighted equally; if set to a nonzero
scalar, the weight for age group a is set proportional to
a^Ha.age.weight;
if a vector of length A, the ath element is the
weight of age group a. Default: 0 . |
Ha.time.weight |
A scalar or a numeric vector with weights that
determine how much smoothing occurs for different time periods when
smoothing over age groups. If 0 or NA , time periods are weighted
equally; if set to a nonzero scalar value, the weight for time
period t in smoothing age groups is proportional to
t^Ha.time.weight; if the argument is a vector of length T, the
tth element is the weight of time period t. Default: 0 . |
Ht.sigma |
This can be set in one of three ways: (1) a scalar
which sets σ_t, the prior standard deviation of E(Y),
indicating how much to smooth E(Y) over time periods (which may
vary over geographic areas and age groups, and with the standard
deviations averaged over time periods). A larger standard deviation
represents more prior uncertainty, which allows the data to play a
greater role. (2) NA to not smooth in this way. (3) To have yourcast
search for a good value based on a target value of the derivative of
E(Y) with respect to time, set to a vector of elements containing
the start and end of a range in sigma in which to look (such as 0.05
and 1.5), the number of values to look at within this range (such as
5), and the target value of the derivative of E(Y) with respect
to time (such as 0.05). The vector may also include a fifth element,
which is the target value of the total standard deviation of E(Y)
over all dimensions of the prior (such as 0.1). (You may choose to
run yourcast with model=ebayes on a related data set to find an
approximate target value of the derivative and standard deviation
automatically.) Default: 0.30 . |
Ht.sigma.sd |
A scalar; the standard deviation of parameter
Ht.sigma (for Gibbs sampling only). Default: 0.1 . |
Ht.deriv |
A numeric vector, each element of which is
n, the degree of a (discrete) derivative of the
smoothness functional with respect to time. Element k of this
vector refers to the (k-1)th derivative, where 0 excludes the
derviative, 1 includes it, and values in between include the
derivative but weight it down proportionally. The first element of
the vector corresponds to the weight on the derivative with respect
to time of order 0 (the identity operator), the second to the weight
on the derivative of order 1 (the 1st derivative), etc. For example,
c(0, 1, 1) corresponds to a mixed functional that penalizes the
first and second derivatives equally. The higher the order of
derivative, the more local smoothness over time; and lowest
specified derivative controls the form of prior
indifference. Default: c(0, 0, 1) , which usually works well. |
Ht.age.weight |
A scalar or a numeric vector with weights that
determine how much smoothing occurs for different age groups when
smoothing over time. If set to 0 or NA , age groups are weighted
equally in smoothing over time; if set to a nonzero scalar, the
weight for age group a is set proportional to a^Ht.age.weight;
if a vector of length A, the ath element is the weight of age
group a. Default: 0. |
Ht.time.weight |
A scalar or a numeric vector with weights that
determine how much smoothing occurs for different time periods when
smoothing over time. If 0 or NA , time periods are weighted equally;
if set to a nonzero scalar value, the weight for time period t in
smoothing time periods is proportional to t^Ht.time.weight; if
the argument is a vector of length T, the tth element is the
weight of time period t. Default: 0. |
Hat.sigma |
This can be set in one of three ways: (1) a
scalar which sets σ_{at}, the prior standard deviation
of E(Y), indicating how much to smooth the time trend in E(Y) over
age groups. A larger standard deviation represents more prior
uncertainty, which allows the data to play a greater role. (2) NA to
not smooth in this way. (3) To have yourcast search for a good value
based on a target value of the derivative of E(Y) with respect to
age and time, set to a vector of elements containing the start and
end of a range in sigma in which to look (such as 0.05 and 1.5), the
number of values to look at within this range (such as 5), and the
target value of the derivative of E(Y) with respect to age and
time (such as 0.05). The vector may also include a fifth element,
which is the target value of the total standard deviation of E(Y)
over all dimensions of the prior (such as 0.1). (You may choose to
run yourcast with model=ebayes on a related data set to find an
approximate target value of the derivative and standard deviation
automatically.) Default: 0.2 . |
Hat.sigma.sd |
A scalar; the standard deviation of parameter
Hat.sigma (for Gibbs sampling only). Default: 0.1 . |
Hat.a.deriv |
A numeric vector, each element of which is n, the degree of a (discrete) derivative of the
smoothness functional of time trends with respect to age
groups. Element k of this vector refers to the (k-1)th
derivative of the time trend v with respect to age, where 0 excludes
the derviative, 1 includes it, and values in between include the
derivative but weight it down proportionally. The first element of
the vector corresponds to the weight on the derivative of the time
trend with respect to age of order 0 (the identity operator), the
second to the weight on the derivative of order 1 (the 1st
derivative), etc. For example, c(0, 1, 1) corresponds to a mixed
functional that penalizes the first and second derivatives
equally. The higher the order of derivative, the more local
smoothness over time; and lowest specified derivative controls the
form of prior indifference. Default: c(0, 0, 1) , which usually works
well. |
Hat.t.deriv |
A numeric vector, each element of which is n, the degree of a (discrete) derivative of the
smoothness functional of age derivative with respect to
time. Element k of this vector refers to the (k-1)th
derivative of the age derivative with respect to time, where 0
excludes the derviative, 1 includes it, and values in between
include the derivative but weight it down proportionally. The first
element of the vector corresponds to the weight on the age
derivative with respect to time of order 0 (the identity operator),
the second to the weight on the derivative of order 1 (the 1st
derivative), etc. For example, c(0, 1, 1) corresponds to a mixed
functional that penalizes the first and second derivatives
equally. The higher the order of derivative, the more local
smoothness over time; and lowest specified derivative controls the
form of prior indifference. Default: c(0, 0, 1) , which usually works
well. |
Hat.age.weight |
A scalar or a numeric vector with weights that
determines how much smoothing occurs for different age groups when
smoothing over age and time. If set to 0 or NA , age groups are
weighted equally in smoothing over time; if set to a nonzero scalar,
the weight for age group a is set proportional to
a^Ht.age.weight; if a vector of length A, the ath element is the
weight of age group a. Default: 0 . |
Hat.time.weight |
A scalar or a numeric vector with weights that
determine how much smoothing occurs for different time periods when
smoothing over age and time. If 0 or NA , time periods are weighted
equally; if set to a nonzero scalar value, the weight for time
period t in smoothing time periods is proportional to
t^Ht.time.weight; if the argument is a vector of length T, the
tth element is the weight of time period t. Default: 0 . |
Hct.sigma |
A scalar which sets σ_t, the prior standard
deviation of E(Y), which indicates how to smooth E(Y) over
geographic areas, or NA to not smooth in this way. The parameter
σ_ct is the expected prior standard deviation of E(Y) for a
geographic area (varying over time periods and age groups, and with
the standard deviations averaged over geographic areas). (A larger
standard deviation represents more prior uncertainty, which allows
the data to play a greater role.) Default: 0.3 . |
Hct.sigma.sd |
A scalar; the standard deviation of parameter
Ht.sigma (for Gibbs sampling only). Default: 0.1 . |
Hct.t.deriv |
A numeric vector; controls whether smoothing the
level or the time trend of E(Y) over geographic areas (both
cannot presently be done simultaneously). To smooth the level of
E(Y) over geographic areas, set to 1, the identity. To smooth the
time trend, set this (as in Hat.t.deriv ) to the weight of the
partial derivative taken with respect to time in the standard
smoothness functional for the prior. The use of the first or higher
order partial derivatives are supported. Default: 1 . |
Hct.time.weight |
A scalar or a numeric vector with weights
that determine how much smoothing occurs for different time periods
when smoothing over geographic areas. If 0 or NA , time periods are
weighted equally; if set to a nonzero scalar value, the weight for
time period t in smoothing over areas is proportional to
t^Hct.time.weight; if the argument is a vector of length T, the
tth element is the weight of time period t. Default: 0 . |
LI.sigma.mean |
A scalar; used in the likelihood and in the
calculation of the priors in conjunction with Ha.sigma.sd ,
Hat.sigma.sd , Ht.sigma.sd , and Hct.sigma.sd .
Default: 0.2 . |
LI.sigma.sd |
A scalar; the standard deviation of
LI.sigma.mean used in the calculation of the priors. Default: 0.1 . |
nsample |
A scalar; represents the number of iterations in the
Gibbs algorithm bayes . Default: 500 . |
low.pow |
Boolean. Whether to include lower-power of explanatory
variables in the simulation as derived from formula . For
example y sim x^4, if low.pow = TRUE , then
x, x^2, x^3, x^4 will be included. Default: TRUE . |
verbose |
Boolean. Suppress verbose output. Default: FALSE |
Returns a list of class ‘yourcast’ containing the following components:
call |
The full call, including all command line options when yourcast was called. |
userfile |
The full userfile if it was specified. |
yhat |
A list with the same cross-sectional elements as the input
data, but with two columns: ‘y’ for the observed dependent
variable and ‘yhat’ for the predicted values. These include both
in-sample and out-of-sample values, as distinguished by the values of
sample.frame . |
coeff |
A list with the same cross-sectional elements as the input data, elements of which are the estimated coefficients if calculated by the chosen model. |
sigma |
A list with the same cross-sectional elements as the input data, elements of which are the estimated standard error of the estimate of the regression (the standard deviation of the dependent variable given the explanatory variables). |
aux |
List. A list of summary information about the yourcast analysis
used by plot.yourcast |
params |
Vector. Smoothing parameters used in model. |
Federico Girosi girosi@rand.org; Elena Villalon evillalon@iq.harvard.edu; Gary King king@harvard.edu
http://gking.harvard.edu/yourcast