earth {earth}R Documentation

Multivariate Adaptive Regression Splines

Description

Build a regression model using the techniques in Friedman's papers ‘Multivariate Adaptive Regression Splines’ and ‘Fast MARS’.

Usage

## S3 method for class 'formula':
earth(formula, data, ...)

## Default S3 method:
earth(x = stop("no 'x' arg"), y = stop("no 'y' arg"),
      weights = NULL, subset = NULL, na.action = na.fail,
      penalty = if(degree > 1) 3 else 2, trace = 0, keepxy = FALSE,
      degree = 1, nk = max(21, 2 * NCOL(x) + 1),
      thresh = 0.001, minspan = 1, newvar.penalty = 0,
      fast.k = 20, fast.beta = 1, fast.h = NULL,
      pmethod = "backward", ppenalty = penalty, nprune = NULL,
      Object  = NULL, Get.crit = get.gcv,
      Eval.model.subsets = eval.model.subsets,
      Print.pruning.pass = print.pruning.pass,
      Force.xtx.prune = FALSE, ...)

Arguments

All arguments are optional except formula, or x and y. NAs are not allowed.
To start off, look at the arguments formula, x, y, degree, nk, and nprune.

formula Model formula.
data Data frame for formula.
x Matrix containing the independent variables.
y Vector containing the response variable, or in the case of multiple responses, a matrix whose columns are the response values for each variable. If the y values are very big or very small then you may get better results if you scale y first for better numerical stability in the forward pass.
weights Weight vector (not yet supported).
subset Index vector specifying which cases to use i.e. which rows in x to use. Default is NULL, meaning all.
na.action NA action. Default is na.fail, and only na.fail is supported.
penalty Generalized Cross Validation (GCV) penalty per knot. Default is if(degree>1) 3 else 2. A value of 0 penalises only terms, not knots. The value -1 is a special case, meaning no penalty, so GCV=RSS/n. Theory suggests values in the range of about 2 to 3. In practice for big data sets, larger values can be useful to force a smaller model. See also ppenalty.
trace Trace earth's execution. Default is 0. Values:
0 none 1 overview 2 forward 3 pruning 4 more pruning 5 ...
keepxy Set to TRUE to retain x, y, and subset in the returned value. Default is FALSE

The following arguments are for the forward pass
degree Maximum degree of interaction (Friedman's mi). Default is 1, meaning build an additive model.
nk Maximum number of model terms before pruning. Includes the intercept. Default is max(21,2*NCOL(x)+1). The number of terms created by the forward pass will be less than nk if there are linearly dependent terms which must be discarded, or if a forward stopping condition is reached before nk terms. See the section below on the forward pass.
thresh Forward stepping threshold. Default is 0.001. This is one of the arguments used to decide when forward stepping should terminate. See the section below on the forward pass.
minspan Minimum distance between knots.
The default value of 1 means consider all knots.
The special value of 0 means calculate the minspan internally as per Friedman's MARS paper section 3.8 with alpha = 0.05. Intended to increase resistance to runs of noise in the input data. Set trace>=2 to see the calculated value.
Note: the default minspan was 0 prior to earth version 1.1 (July 2007).
newvar.penalty Penalty for adding a new variable in the forward pass (Friedman's gamma, equation 74 in the MARS paper). This argument can mitigate the effects of colinearity or concurvity in the input data. Default is 0, meaning no penalty for adding a new variable. Useful non-zero values range from about 0.01 to 0.2 — you will need to experiment.
fast.k Maximum number of considered parent terms, as as described in Friedman's Fast MARS paper section 3.0. Default is 20. The special value -1 is equivalent to infinity, meaning no Fast MARS. Typical values, apart from -1, are 20, 10, or 5. In general, with a lower fast.k (say 5), earth is faster; with a higher fast.k, or with fast.k disabled (set to -1), earth builds a better model. However it is not unusual to get a slightly better model with a lower fast.k, and you may need to experiment.
fast.beta Fast MARS ageing coefficient, as described in the Fast MARS paper section 3.1. Default is 1. A value of 0 sometimes gives better results.
fast.h Fast MARS h, as described in the Fast MARS paper section 4.0. (not yet implemented).

The following arguments are for the pruning pass. One strategy is to build a large model and then trim it back using update.earth with the arguments below.
pmethod Pruning method. Default is "backward". One of: backward none exhaustive forward seqrep. If y has multiple columns, then only backward or none is allowed. Pruning can take a while if exhaustive is chosen and the model is big (more than about 30 terms). The current version of the leaps package used during pruning does not allow user interrupts (i.e. you have to kill your R session to interrupt).
ppenalty Like penalty but for the pruning pass. Default is penalty.
nprune Maximum number of terms (including intercept) in the pruned model. Default is NULL, meaning all terms. Use this to reduce exhaustive search time, or to enforce a maximum model size.

The following arguments are for internal or advanced use
Object Earth object to be updated, for use by update.earth.
Get.crit Criterion function for model selection during pruning. By default a function that returns the GCV. See the section below on the pruning pass.
Eval.model.subsets Function to evaluate model subsets — see notes in source code.
Print.pruning.pass Function to print pruning pass results — see notes in source code.
Force.xtx.prune Default is FALSE. This argument pertains to subset evaluation in the pruning pass. By default, if y has a single column then earth calls the leaps routines; if y has multiple columns then earth calls EvalSubsetsUsingXtx. The leaps routines are based on the QR decomposition; the EvalSubsetsUsingXtx routine is based on the inverse of X'X. The leaps routines are more accurate but do not support multiple responses. Setting Force.xtx.prune=TRUE forces use of EvalSubsetsUsingXtx, even if y has a single column.
... earth.formula: arguments passed to earth.default.
earth.default: unused, but provided for generic/method consistency.

Value

An object of class ‘earth’ which is a list with the components listed below. Term refers to a term created during the forward pass (each line of the output from format.earth is a term). Term number 1 is always the intercept.

bx Matrix of basis functions applied to x. Each column corresponds to a selected term. Each row corresponds to a row in in the input matrix x, after taking subset. See model.matrix.earth for an example of bx handling. Example bx:
    (Intercept) h(Girth-12.9) h(12.9-Girth) h(Girth-12.9)*h(...
[1,]          1           0.0           4.6                   0
[2,]          1           0.0           4.3                   0
[3,]          1           0.0           4.1                   0
...
dirs Matrix with ij-th element equal to 1 if term i has a factor of the form x_j > c, equal to -1 if term i has a factor of the form x_j <= c, and to 0 if x_j is not in term i. This matrix includes all terms generated by the forward.pass, including those not in selected.terms. Note that the terms may not be in pairs, because the forward pass deletes linearly dependent terms before handing control to the pruning pass. Example dirs:
                       Girth Height
(Intercept)                0      0  #no factors in intercept
h(Girth-12.9)              1      0  #2nd term uses Girth
h(12.9-Girth)             -1      0  #3rd term uses Girth
h(Girth-12.9)*h(Height-76) 1      1  #4th term uses Girth and Height
...
cuts Matrix with ij-th element equal to the cut point for variable j in term i. This matrix includes all terms generated by the forward.pass, including those not in selected.terms. Note that the terms may not be in pairs, because the forward pass deletes linearly dependent terms before handing control to the pruning pass. Example cuts:
                           Girth Height
(Intercept)                  0.0      0  #intercept, no cuts
h(Girth-12.9)               12.9      0  #2nd term has cut at 12.9
h(12.9-Girth)               12.9      0  #3rd term has cut at 12.9
h(Girth-12.9)*h(Height-76)  12.9     76  #4th term has two cuts
...
selected.terms Vector of term numbers in the best model. Can be used as a row index vector into cuts and dirs. The first element selected.terms[1] is always 1, the intercept.
prune.terms A matrix specifying which terms appear in which subsets. The row index of prune.terms is the model size (the model size is the number of terms in the model). Each row is a vector of term numbers for the best model of that size. An element is 0 if the term is not in the model, thus prune.terms is a lower triangular matrix, with dimensions nprune x nprune. The model selected by the pruning pass is at row length(selected.terms). Example prune.terms:
[1,]    1  0  0  0  0  0  0  #intercept-only model
[2,]    1  2  0  0  0  0  0  #best 2 term model uses terms 1,2.
[3,]    1  2  4  0  0  0  0  #best 3 term model uses terms 1,2,4
[4,]    1  2  9  8  0  0  0
...
rss Residual sum-of-squares (RSS) of the model (summed over all responses if y has multiple columns).
rsq 1-rss/rss.null. R-Squared of the model (calculated over all responses if y has multiple columns). A measure of how well the model fits the training data.
gcv Generalised Cross Validation (GCV) of the model (summed over all responses if y has multiple columns). The GCV is calculated using ppenalty (as are all returned GCVs). For details of the GCV calculation, see equation 30 in Friedman's MARS paper and earth:::get.gcv.
grsq 1-gcv/gcv.null. An estimate of the predictive power of the model (calculated over all responses if y has multiple columns). Unlike rsq, grsq can be negative. A negative grsq would indicate a severely over parameterised model — a model that would not generalise well even though it may be a good fit to the training data. Example of a negative grsq:
earth(mpg~., data=mtcars, pmethod="none", trace=4)
rss.per.response A vector of the RSS for each response. Length is ncol(y). The rss component above is equal to sum(rss.per.response).
rsq.per.response A vector of the R-Squared for each response. Length is ncol(y).
gcv.per.response A vector of the GCV for each response. Length is ncol(y). The gcv component above is equal to sum(gcv.per.response).
grsq.per.response A vector of the GRSq for each response. Length is ncol(y).
rss.per.subset A vector of the RSS for each model subset generated by the pruning pass. Length is nprune. If y has multiple columns, the RSS is summed over all responses for each subset. The null RSS (i.e. the RSS of an intercept only-model) is rss.per.subset[1]. The rss above is
rss.per.subset[length(selected.terms)].
gcv.per.subset A vector of the GCV for each model in prune.terms. Length is is nprune. If y has multiple columns, the GCV is summed over all responses for each subset. The null GCV (i.e. the GCV of an intercept-only model) is gcv.per.subset[1]. The gcv above is gcv.per.subset[length(selected.terms)].
fitted.values Fitted values. A matrix with dimensions nrow(y) x ncol(y).
residuals Residuals. A matrix with dimensions nrow(y) x ncol(y).
coefficients Regression coefficients. A matrix with dimensions length(selected.terms) x ncol(y). Each column holds the least squares coefficients from regressing that column of y on bx. The first row holds the intercept coefficients.
ppenalty The GCV penalty used during pruning. A copy of earth's ppenalty argument.
call The call used to invoke earth.
terms Model frame terms. This component exists only if the model was built using earth.formula.
x
y
subset Copy of input arguments x, y, and subset. These components exist only if keepxy=TRUE.

Note

Standard Model Functions

Standard model functions such as case.names are provided for earth objects and are not explicitly documented.

Other Implementations

The results are similar to but not identical to other Multivariate Adaptive Regression Splines implementations. The differences stem from the forward pass where very small implementation differences (or perturbations of the input data) can cause rather different selection of terms and knots. The backward passes give identical or near identical results, given the same forward pass results.

The source code of earth is derived from mars in the mda package written by by Trevor Hastie and Robert Tibshirani. See also mars.to.earth. Earth's principal advantages over mda::mars are that it is much faster and provides plotting and printing methods.

The term ‘MARS’ is trademarked and licensed exclusively to Salford Systems http://www.salfordsystems.com. Their implementation uses an engine written by Friedman and offers more features than earth.

Limitations

The following aspects of MARS are mentioned in Friedman's papers but not implemented in earth:

i) Piecewise cubic models
ii) Specifying which predictors must enter linearly
iii) Specifying which predictors can interact
iv) Model slicing (plotmo goes part way)
v) Handling missing values
vi) Logistic regression
vii) Special handling of categorical predictors
viii) Fast MARS h parameter
ix) Cross validation to determine penalty
x) Anova tables with sigma and other information.

Large Models and Execution Time

For a given set of input data, the following can increase the speed of the forward pass:

i) decreasing fast.k
ii) decreasing nk
iii) decreasing degree
iv) increasing threshold
v) increasing min.span.

The backward pass is normally much faster than the forward pass, unless pmethod="exhaustive". Reducing nprune reduces exhaustive search time. One strategy is to first build a large model and then adjust pruning parameters such as nprune using update.earth.

The Forward Pass

The forward pass adds terms in pairs until the first of the following conditions is met:

i) reach maximum number of terms (nterms>=nk)
ii) reach DeltaRSq threshold (DeltaRSq<thresh) where DeltaRSq is the difference in R-Squared caused by adding the current term pair
iii) reach max RSq (RSq>1-thresh)
iv) reach min GRSq (GRSq< -10).

Set trace>=1 to see the stopping condition.

The result of the forward pass is the set of terms defined by $dirs and $cuts.

Note that GCVs (via GRSq) are used during the forward pass only as one of the stopping conditions and in trace prints. Changing the GCV $penalty does not change the knot positions.

The various stopping conditions mean that the actual number of terms created by the forward pass may be less than nk. There are some more reasons why the actual number of terms may be less than nk: (i) the forward pass chucks out one "side" of the pair if it adds nothing to the model — but the forward pass counts terms as if they were actually created in pairs. (ii) as a final step, the forward pass deletes linearly dependent terms, if any, so all terms in $dirs and $cuts are independent. And remember that the pruning pass will further discard terms.

The Pruning Pass

The pruning pass is handed the sets of terms created by the forward pass. Its job is to find the subset of these terms that gives the lowest GCV. The pruning pass works like this: it determines the subset of terms (using pmethod) with the lowest RSS for each model size in 1:nprune (see Force.xtx.prune above for some details). It saves the RSS and term numbers for each such subset in rss.per.subset and prune.terms. It then applies the Get.crit function with ppenalty to rss.per.subset to yield gcv.per.subset. It chooses the model with the lowest value in gcv.per.subset, and puts its term numbers into selected.terms. Finally, it runs lm to determine the fitted.values, residuals, and coefficients, by regressing the response y on the selected.terms of bx.

Set trace>=3 to trace the pruning pass.

By default Get.crit is earth:::get.gcv. Alternative Get.crit functions can be defined. See the source code of get.gcv for an example.

Testing on New Data

This example demonstrates one way to train on 80% of the data and test on the remaining 20%. In practice a dataset larger than the one below should be used for splitting. Also, remember that the test set should not be used for parameter tuning — use GCVs or separate validation sets for that.

train.subset <- sample(1:nrow(trees), .8 * nrow(trees))
test.subset <- (1:nrow(trees))[-train.subset]
a <- earth(Volume ~ ., data = trees[train.subset, ])
yhat <- predict(a, newdata = trees[test.subset, ])
y <- trees$Volume[test.subset]
print(1 - sum((y - yhat)^2) / sum((y - mean(y))^2)) # print R-Squared
How do I measure variable importance?

Establishing predictor importance is in general a tricky and even controversial problem.

Running plotmo with ylim=NULL (the default) gives an idea of which predictors make the largest changes to the predicted value.

You can also use drop1 (assuming you are using the formula interface to earth). Calling drop1(my.earth.model) will delete each predictor in turn from your model, rebuild the model from scratch each time, and calculate the GCV each time. You will get warnings that the earth library function extractAIC.earth is returning GCVs instead of AICs — but that is what you want so you can ignore the warnings. The column labeled AIC in the printed response from drop1 will actually be a column of GCVs not AICs. The Df column is not much use in this context.

You will get lots of output from drop1 if you built your original earth model with trace>0. You can set trace=0 by updating your model before calling drop1. Do it like this:
my.model <- update.earth(my.model, trace=0).

Remember that these techniques only tell you how important a variable is with the other variables already in the model. There are alternative ways of measuring variable importance (using resampling) but they are not yet implemented.

How do I know which predictors were added to the model first?

You can see the forward pass adding terms with trace=2 or higher. But remember, pruning will remove some of the terms. Another approach is to use

summary(my.model, decomp="none")

which will list the basis functions remaining after pruning, in the order they were added by the forward pass.

Can I get a list of the predictors actually used in the model?

Use this function:

get.used.pred.names <- function(obj)   # obj is an earth object
    names(which(apply(obj$dirs[obj$selected.terms,,drop=FALSE],2,any)))
Multiple Response Models

If y has K columns then earth builds K simultaneous models. Each model has the same set of basis functions (i.e. same bx and selected.terms) but different coefficients (the returned coefficients will have K columns). The models are built and pruned as usual but with the GCVs and RSSs averaged across all K responses.

Since earth attempts to optimize for all models simultaneously, the results will not be as "good" as building the models independently. i.e. the GCV of the combined model will not be as good as the GCVS for independent models, on the whole. However, the combined model may be a better model in other senses, depending on what you are trying to achieve.

For more details see section 4.1 of Hastie, Tibshirani, and Buja Flexible Discriminant Analysis by Optimal Scoring, JASA, December 1994 http://www-stat.stanford.edu/~hastie/Papers/fda.pdf.

Using earth with fda and mda

Earth can be used with fda and mda in the mda package. Earth will be a multiple response model, as described above. Use keep.fitted=TRUE if you want to call plot.earth later (actually only necessary for large datasets, see the description of keep.fitted in fda). Use keepxy=TRUE if you want to call update or plotmo later. Use trace>=3 to see the call to earth generated by fda or mda. Example:

library(mda)
(a <- fda(Species ~ ., data=iris, keep.fitted=TRUE, method=earth, keepxy=TRUE))
plot(a)
summary(a$fit)  # examine earth model embedded in fda model
plot(a$fit)
plotmo(a$fit, ycolumn=1, ylim=c(-1.5,1.5), clip=FALSE)
plotmo(a$fit, ycolumn=2, ylim=c(-1.5,1.5), clip=FALSE)
Warning and Error Messages

Earth prints most error and warning messages without printing the ‘call’. If you are mystified by a warning message, try setting options(warn=2) and using traceback.

Author(s)

Stephen Milborrow, derived from mda::mars by Trevor Hastie and Robert Tibshirani.

Users are encouraged to send feedback — use milboATsonicPERIODnet http://www.milbo.users.sonic.net.

References

The primary references are the Friedman papers. Readers may find the MARS section in Hastie, Tibshirani, and Friedman a more accessible introduction. Faraway takes a hands-on approach, using the ozone data to compare mda::mars with other techniques. (If you use Faraway's examples with earth instead of mars, use $bx instead of $x). Earth's pruning pass uses leaps which is based on techniques in Miller.

Faraway Extending the Linear Model with R http://www.maths.bath.ac.uk/~jjf23

Friedman (1991) Multivariate Adaptive Regression Splines (with discussion) Annals of Statistics 19/1, 1–141

Friedman (1993) Fast MARS Stanford University Department of Statistics, Technical Report 110 http://www-stat.stanford.edu/research/index.html

Hastie, Tibshirani, and Friedman (2001) The Elements of Statistical Learning http://www-stat.stanford.edu/~hastie/pub.htm

Miller, Alan (1990, 2nd ed. 2002) Subset Selection in Regression

See Also

format.earth, get.nterms.per.degree, get.nused.preds.per.subset, mars.to.earth, model.matrix.earth, ozone1, plot.earth.models, plot.earth, plotmo, predict.earth, reorder.earth, summary.earth, update.earth

Examples

a <- earth(Volume ~ ., data = trees)
summary(a, digits = 2)

# yields:
#    Call:
#    earth(formula = Volume ~ ., data = trees)
#    
#    Expression:
#      27 
#      +    6 * pmax(0,  Girth -     14) 
#      -  3.2 * pmax(0,     14 -  Girth) 
#      + 0.61 * pmax(0, Height -     75) 
#    
#    Number of cases: 31
#    Selected 4 of 5 terms, and 2 of 2 predictors
#    Number of terms at each degree of interaction: 1 3 (additive model)
#    GCV: 11          RSS: 196         GRSq: 0.96      RSq: 0.98

[Package earth version 1.1-4 Index]