booteval.relimp {relaimpo}R Documentation

Functions to Bootstrap Relative Importance Metrics

Description

These functions provide bootstrap confidence intervals for relative importances. boot.relimp uses the R package boot to do the actual bootstrapping of requested metrics (which may take quite a while), while booteval.relimp evaluates the results and provides confidence intervals. Output from booteval.relimp is printed with a tailored print method, and a plot method produces bar plots with confidence indication of the relative importance metrics.

Usage

boot.relimp(y, x, b = 1500, type = "lmg", 
    rank = TRUE, diff = TRUE, rela = TRUE)

booteval.relimp(bootrun, bty = "bca", level = 0.95, 
    sort = FALSE, norank = FALSE, nodiff = FALSE, 
    typesel = c("lmg", "pmvd", "last", "first", "betasq", "pratt"))

Arguments

y y is a vector of response data
x x is a matrix or data frame of regressor variables
b b is the number of bootstrap runs requested on boot.relimp (default: b=1500). Make sure to set this to a lower number, if you are simply testing code.
type type is a character vector requesting metrics that are to be calculated. Available metrics: lmg, pmvd (non-US version only), last, first, betasq, pratt, described in calc.relimp
rank rank is a logical requesting bootstrapping of ranks (rank=TRUE, default) for each metric from type
diff diff is a logical requesting bootstrapping of pairwise differences in relative importance (diff=TRUE, default) for each metric in type
rela rela is a logical requesting relative importances summing to 100 If rela is FALSE, some of the metrics sum to R^2 (lmg, pmvd, pratt), others do not have a meaningful sum (last, first, betasq). More detail is given in calc.relimp.
bootrun bootrun is an object of class relimplmboot created by function boot.relimp. It hands over all relevant information on the bootstrap runs to function booteval.relimp.
bty bty is the type of bootstrap interval requested, as handed over to the function boot.ci from package boot. Possible choices are bca, perc, basic and norm. student is not supported.
level level is a single confidence level or a numeric vector of confidence levels.
sort sort is a logical requesting output sorted by size of relative contribution (sort=TRUE) or by variable position in list (sort=FALSE, default).
norank norank is a logical that suppresses of rank letters (norank=TRUE) even if ranks have been bootstrapped.
nodiff nodiff is a logical that suppresses output of confidence intervals for differences (nodiff=TRUE), even if differences have been bootstrapped.
typesel typesel provides the metrics that are to be reported. Default: all available ones (intersection of those available in object bootrun and those requested in typesel)

Details

Calculations of metrics are based on the function calc.relimp. Bootstrapping is done with the R package boot, resampling the full observation vectors (combinations of response and regressors, cf. Fox (2002).

The output provides results for all selected relative importance metrics. The output object can be printed and plotted (description of syntax: classesmethods.relaimpo).

Printed output: In addition to the standard output of calc.relimp (one row for each regressor, one column for each bootstrapped metric), there is a table of confidence intervals for each selected metric (one row per combination of regressor and metric). This table is enhanced by information on rank confidence intervals, if ranks have been bootstrapped (rank=TRUE) and norank=FALSE. In addition, if differences have been bootstrapped (diff=TRUE) and nodiff=FALSE, there is a table of estimated pairwise differences with confidence intervals.

Graphical output: Application of the plot method to the object created by booteval.relimp yields barplot representations for all bootstrapped metrics (all in one graphics window). Confidence level (lev=) and number of characters in variable names to be used (names.abbrev=) can be modified. Confidence bounds are indicated on the graphs by added vertical lines. par() options can be used for modifying output (exception: mfrow is overridden by the plot method).

Value

The value of boot.relimp is of class relimplmboot. It is designed to be useful as input for booteval.relimp and is not further described here. booteval.relimp returns an object of class relimplmbooteval, the items of which can be accessed by the $ or the @ extractors.
In addition to the items described for function calc.relimp, which are also available here, the following items may be of interest for further calculations:

metric.lower matrix of lower confidence bounds for “metric”: one row for each confidence level, one column for each element of “metric”. “metric” can be any of lmg, lmg.rank, lmg.diff, ... (replace lmg with other available relative importance metrics, cf. calc.relimp)
metric.upper matrix of upper confidence bounds for “metric”: one row for each confidence level, one column for each element of “metric”
metric.boot matrix of bootstrap results for “metric”: one row for each bootstrap run, one column for each element of “metric”. Here, “metric” can be chosen as any of the above-mentioned and also as R^2
nboot number of bootstrap runs underlying the evaluations
level confidence levels

Warning

The bootstrap confidence intervals should be used for exploratory purposes only. They are somewhat liberal: Simulations have shown that non-coverage probabilities can be twice the nominal probabilities. More investigations are needed.

Be aware that the method itself needs some computing time in case of many regressors. Hence, bootstrapping should be used with awareness of computing time issues.

Note

There are two versions of this package. The version on CRAN is globally licensed under GPL version 2 (or later). There is an extended version with the interesting additional metric pmvd that is licensed according to GPL version 2 under the geographical restriction "outside of the US" because of potential issues with US patent 6,640,204. This version can be obtained from Ulrike Groempings website (cf. references section). Whenever you load the package, a display tells you, which version you are loading.

Author(s)

Ulrike Groemping, TFH Berlin

References

Chevan, A. and Sutherland, M. (1991) Hierarchical Partitioning. The American Statistician 45, 90–96.

Feldman, B. (2005) Relative Importance and Value. Manuscript (Version 1, March 8 2005), downloadable at http://www.qwafafew.org/?q=filestore/download/268

Fox, J. (2002) Bootstrapping regression models. An R and S-PLUS Companion to Applied Regression: A web appendix to the book. http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-bootstrapping.pdf.

Lindeman, R.H., Merenda, P.F. and Gold, R.Z. (1980) Introduction to Bivariate and Multivariate Analysis, Glenview IL: Scott, Foresman.

Go to http://www.tfh-berlin.de/~groemp for further information and references.

See Also

See also calc.relimp, classesmethods.relaimpo

Examples

#####################################################################
### Example: relative importance of various socioeconomic indicators 
###          for Fertility in Switzerland
### Fertility is first column of data set swiss
#####################################################################
data(swiss)
   # bootstrapping
       bootswiss <- boot.relimp(swiss[,1], swiss[,2:6], b = 100,  
                type = c("lmg", "last", "first", "pratt"),
                rank = TRUE, diff = TRUE, rela = TRUE)
       # for demonstration purposes only 100 bootstrap replications

   #default output
    booteval.relimp(bootswiss)
         #because of only 100 bootstrap replications, 
         #default bca intervals produce warnings
    plot(booteval.relimp(bootswiss))

    #sorted printout, chosen confidence levels, chosen interval method
    #store as object
        result <- booteval.relimp(bootswiss, bty="perc", 
              sort = TRUE, level=c(0.8,0.9))
    #output driven by print method
        result
    #result plotting with default settings 
    #(largest confidence level, names abbreviated to length 4)
        plot(result)
    #result plotting with modified settings (chosen confidence level, 
    #names abbreviated to chosen length)
        plot(result, level=0.8,names.abbrev=5)
    #result plotting with longer names shown vertically
        par(las=2)
        plot(result, level=0.9,names.abbrev=6)
    #plot does react to options set with par()
    #exception: mfrow-option is set within plot, 
    #depending on the number of metrics to be shown

[Package relaimpo version 0.5 Index]