crossbasis {dlnm} | R Documentation |
Generate the basis functions for the two spaces of predictor and lags, choosing among a set of possible bases. Then, these functions are combined in order to create the related cross-basis matrix, which can be included in a model formula to fit a distributed lag non-linear model (DLNM).
crossbasis(var, vartype="ns", vardf=1, vardegree=1, varknots=NULL, varbound=range(var), varint=FALSE, cen=TRUE, cenvalue=mean(var), maxlag=0, lagtype="ns", lagdf=1, lagdegree=1, lagknots=NULL, lagbound=c(0,maxlag), lagint=TRUE) ## S3 method for class 'crossbasis': summary(object, ...)
The arguments below define two set of basis functions calling the internal functions mkbasis
and mklagbasis
. The first one is applied to var
, in order to describe the relationship in the space of the predictor. The second one is applied to a new vector 0:maxlag
, in order to describe the relationship in the space of lags. Many arguments refer to the specific basis for each space (with stub var-
or lag-
). Then, the two set of basis functions are combined in order to create the related cross-basis functions.
var |
the predictor variable, defined as a numeric vector of ordered observations. |
vartype, lagtype |
type of basis. See Details below for the list of possible choices. |
vardf, lagdf |
dimension of the basis, equivalent to number of degrees of freedom spent to specify the relationship in each space. They depend on knots if provided, or on degree for type="poly" . |
vardegree, lagdegree |
degree of polynomial. Used only for type equal to "bs" (degree of the piecewise polynomial for the B-spline) or "poly" (degree of the polynomial). |
varknots, lagknots |
knots location for the basis. They specify the position of the internal knots for "ns" and "bs" , the cut-off points for "strata" (defining right-open intervals) and the threshold(s)/cut-off points for "lthr" , "hthr" and "dthr" . They must be set within the range of var and 0:maxlag , respectively, and if provided, are automatically ordered and made unique, determining the value of df . If only df is provided, varknots are placed at equally spaced quantiles (in the space of predictor), and lagknots at equally spaced values on the log scale of lags. |
varbound, lagbound |
boundary knots (sometimes called external knots). Used only for type equal to "ns" and "bs" . |
varint, lagint |
logical. If TRUE and df>1 , an 'intercept' is included in the basis. The default values should not be changed: see Warnings below. |
cen |
logical. If TRUE , the basis functions for the space of predictor are centered. See Note below. |
cenvalue |
centering value, used as a reference point for the predicted effects. |
maxlag |
a positive value defining the maximum lag. |
object |
a object of class "crossbasis" . |
... |
additional arguments to be passed to summary . |
The value in type
defines the basis for each space (predictor and lags). It must be one of:
"ns"
: natural cubic B-splines (constrained to be linear beyond the boundary knots). Specified by knots
(internal knots) and bound
(boundary or external knots). See the functions ns
for additional information. If knots
is provided, the dimension df
is set to length(knots)+1+int
. An intercept is included if int=T
. The transformed variables can be centered at cenvalue
.
"bs"
: B-splines characterized by degree
(degree of the piecewise polynomial). Specified by knots
(internal knots) and bound
(boundary or external knots). See the functions bs
for additional information. If knots
is provided, the dimension df
is set to length(knots)+degree+int
; if not, df
must be higher than degree+int
. An intercept is included if int=T
. The transformed variables can be centered at cenvalue
.
"strata"
: strata variables (dummy parameterization) determined by internal cut-off values specified in knots
, which represent the lower boundaries for the right-open intervals. Intervals containing no observation are automatically discarded. If knots
is provided, the dimension df
is set to length(knots)+int
. A dummy variable for the reference stratum (the first one by default) is included if int=T
, generating a full rank basis. Never centered.
"poly"
: polynomial with power specified by degree
. The dimension df
is set to to degree+int
. An intercept, corresponding to a vector of 1's (the power 0 of the polynomial) is included if int=T
. The transformed variables can be centered at cenvalue
.
"integer"
: strata variables (dummy parameterization) for each integer values, expressly created to specify an unconstrained function in the space of lags. df
is set automatically to the number of integer values minus 1 plus int
. A dummy variable for the reference stratum (the first one by default) is included if int=T
, generating a full rank basis. Never centered.
"hthr"
, "lthr"
: high and low threshold parameterization, with a linear relationship above or below the threshold, respectively, and flat otherwise. The threshold is chosen by knots
: if more than one is provided, a piecewise linear relationship is applied above the first knot or below the last one, respectively, with the slope changing at each further knot. df
is automatically set to length(knots)+int
. An intercept (corresponding to a vector of 1's) is included if int=T
. Never centered.
"dthr"
: double threshold parameterization (2 independent linear relationships above the second and below the first threshold, flat between them). The thresholds are chosen by knots
. If only one is provided, the threshold is unique (V-model). If more than 2 are provided, the first and the last ones are chosen. df
is automatically set to 2+int
. An intercept (corresponding to a vector of 1's) is included if int=T
. Never centered.
"lin"
: linear relationship (untransformed apart from optional centering). df
is automatically set to 1+int
. An intercept (corresponding to a vector of 1's) is included if int=T
. It can be centered at cenvalue
.
Some arguments can be automatically changed for not sensible combinations, or set to NULL
if not required.
For a detailed overview of the options, see:
vignette("dlnmOverview")
A matrix object of class "crossbasis"
which can be included in a model formula in order to fit a DLNM. It contains the attributes crossdf
(global number of degrees of freedom) and range
(range of the original vector of observations). Additional attributes are returned that correspond to the arguments to crossbasis
, and explicitly give type
, df
, degree
, knots
, bound
, cen
, cenvalue
and maxlag
related to the corresponding basis ( with stub var-
or lag-
) for use of crosspred
. The function summary.crossbasis
returns a summary of the cross-basis matrix and the related attributes, and can be used to check the options for the bases chosen for the two dimensions.
It is strongly recommended to avoid the inclusion of an intercept in the basis for var
, otherwise the presence of the additional intercept (when included) in the model used to fit the data will cause some of the cross-basis variables to be excluded. Conversely, an intercept should always be included in the basis for the space of lags when lagtype
is equal to "ns"
, "bs"
, "strata"
or "poly"
.
The values in var
are expected to be equally spaced (with that space defining a lag unit) and ordered in time. NA
values are allowed.
The name of the crossbasis object will be used by crosspred
in order to extract the related estimated parameters. This name must not match the names of other predictors in the model formula. In addition, if more than one variable is transformed by cross-basis functions in the same model, different names must be specified.
For continuous functions specified with vartype
equal to "ns"
, "bs"
, "poly"
or "lin"
, the reference for the effects predicted by crosspred
is set at cenvalue
. For the other choices, the reference is automatic: for vartype
equal to "strata"
and "integer"
, the reference is the first interval, while for vartype
equal to "hthr"
, "lthr"
and "dthr"
, the reference is the region of null effect below, above or between the threshold(s), respectively.
Antonio Gasparrini, antonio.gasparrini@lshtm.ac.uk
Armstrong, B. Models for the relationship between ambient temperature and daily mortality. Epidemiology. 2006, 17(6):624-31.
# Example 1. See crosspred and crossplot for other examples ### simple DLM for the effect of PM10 on mortality up to 15 days of lag ### space of predictor: linear effect for PM10 ### space of predictor: 5df natural cubic spline for temperature ### lag function: 4th degree polynomial for PM10 ### lag function: strata intervals at lag 0 and 1-3 for temperature data(chicagoNMMAPS) basis.pm <- crossbasis(chicagoNMMAPS$pm10, vartype="lin", lagtype="poly", lagdegree=4,cen=FALSE,maxlag=15) basis.temp <- crossbasis(chicagoNMMAPS$temp, vardf=5, lagtype="strata", lagknots=1, cenvalue=21, maxlag=3) summary(basis.pm) summary(basis.temp) model <- glm(death ~ basis.pm + basis.temp, family=quasipoisson(), chicagoNMMAPS) pred.pm <- crosspred(basis.pm, model, at=0:20) crossplot(pred.pm,"slices",var=10, title="Effect of a 10-unit increase in PM10 along lags") # overall effect for a 10-unit increase in PM over 15 days of lag, with CI pred.pm$allRRfit["10"] cbind(pred.pm$allRRlow, pred.pm$allRRhigh)["10",] crossplot(pred.pm, "overall", ylim=c(0.99,1.04), label="PM10", ci="lines", title="Overall effect of PM10 over 15 days of lag") ### See the vignette 'dlnmOverview' for a detailed explanation of this example