perturb {accuracy}R Documentation

Data Perturbations based Sensitivity Analysis

Description

This function replicates an statistical analysis using slightly perturbed versions of the original input data, and then analyzes the sensitivity of the model to susch changes. This can be used to draw attention to inferences that cannot be supported confidently given the current data, model, and algorithm/implementation,

Usage

perturb(data, statistic, ..., ptb.R = 1, ptb.ran.gen = PTBms, ptb.s = NULL)

Arguments

data data matrix to be perturbed
statistic statistic model to be run on data
... additional arguments to statistical model
ptb.R number of replications for perturbation analysis
ptb.ran.gen a single function, or a vector of functions to be used to perturb each vector see PTBi
ptb.s a size, or vector of sizes, to be used in the vector perturbation functions

Details

Sensitivity to numerical inaccuracy, and measurement error is very hard to measure formally. This empirical sensitivity tests draws attention to inferences that cannot be supported confidently given the current data, model, and algorithm/implementation,

The empirical approach works by replicating the original analysis while slightly perturbing the original input data in different ways. The sensitivity of the model estimates (e.g. estimated coefficients, standard errors and log-likelihoods) are then summarized.

The sensitivity analysis cannot be used to prove the accuracy of a particular method, but is useful in drawing attention to potential problems. Further experimentation and analysis may be necessary to determine the specific cause of the problem: numerical instability, statistical sensitivity to measurement error, or ill-conditioning.

Value

Returns a list which contains the result of each model run. Along with attributes about the settings used in the perturbations. Use summary to summarize the results or extract a matrix of of the model parameters across the entire set of runs.

Author(s)

Micah Altman Micah_Altman@harvard.edu http://www.hmdc.harvard.edu/micah_altman/

References

Altman, M., J. Gill and M. P. McDonald. 2003. Numerical Issues in Statistical Computing for the Social Scientist. John Wiley & Sons. http://www.hmdc.harvard.edu/numerical_issues/

See Also

See Also as PTBi

Examples


# Examine the sensitivity of the GLM from Venables & Ripley (2002, p.189)
# as described in the glm module.
# 
# Perturb the two independent variables using +/- 0.025
#       (relative to the size of each observations)
# uniformly distributed noise. Dependent variable is not being modified.
# 
# Summary should show that estimated coefficients are not substantively affected by noise.

data(anorexia,package="MASS")
panorexia = perturb(anorexia, glm, Postwt ~ Prewt + Treat + offset(Prewt),
     family=gaussian, 
    ptb.R=100, ptb.ran.gen=c(PTBi,PTBus,PTBus), ptb.s=c(1,.005,.005) )
summary(panorexia)

# Use classic longley dataset. The model is numerically unstable, 
# and much more sensitive to noise.  Smaller amounts of noise tremendously
# alter some of the estimated coefficients:
# 
# In this example we are not perturbing the dependent variable (employed) or 
# the year variable. So we assign then PTBi or NULL in ptb.ran.gen )

if(version$major<2) {
        data(longley, package="base")
} else {
        data(longley, package="datasets")
}

plongley = perturb(longley,lm,Employed~., ptb.R=100, 
    ptb.ran.gen=c(PTBi, replicate(5,PTBus),PTBi), ptb.s=c(1,replicate(5,.001),1))
sp=summary(plongley)

# summarizes range
print(sp)

# look in summary object to extract more ...
names(attributes(sp))

# print metrix of coefficients from all runs
coef= attr(sp,"coef.betas.m")
summary(coef)


[Package accuracy version 1.04 Index]