sensitivty {accuracy}R Documentation

Data Perturbations based Sensitivity Analysis

Description

This function replicates an statistical analysis using slightly perturbed versions of the original input data, and then analyzes the sensitivity of the model to such changes. This can be used to draw attention to inferences that cannot be supported confidently given the current data, model, and algorithm/implementation,

Usage

sensitivity(data, statistic, ..., ptb.R = 50, ptb.ran.gen = NULL , ptb.s = NULL, 
        summarize=FALSE,  keepData = FALSE, ptb.rangen.ismatrix=FALSE)
perturb(...)

Arguments

data data matrix to be perturbed
statistic statistic model to be run on data
... additional arguments to statistical model
ptb.R number of replications for perturbation analysis
ptb.ran.gen a single function, or a vector of functions to be used to perturb each vector see PTBi
ptb.s a size, or vector of sizes, to be used in the vector perturbation functions
summarize if true, return a sensitivity summary, as would summary(sensitivity()). This reduces system memory use and can significantly speed up the analysis of large datasets.
keepData for debugging, store data for each perturbation
ptb.rangen.ismatrix If true, expects ptb.ran.gen to be a matrix function, for use with correlated noise structure. See below

Details

Sensitivity to numerical inaccuracy, and measurement error is very hard to measure formally. This empirical sensitivity tests draws attention to inferences that cannot be supported confidently given the current data, model, and algorithm/implementation,

The empirical approach works by replicating the original analysis while slightly perturbing the original input data in different ways. The sensitivity of the model estimates (e.g. estimated coefficients, standard errors and log-likelihoods) are then summarized.

The sensitivity analysis cannot be used to prove the accuracy of a particular method, but is useful in drawing attention to potential problems. Further experimentation and analysis may be necessary to determine the specific cause of the problem: numerical instability, statistical sensitivity to measurement error, or ill-conditioning.

Value

Returns a list which contains the result of each model run. Along with attributes about the settings used in the perturbations. Use summary to summarize the results or extract a matrix of of the model parameters across the entire set of runs.

Note

If ptb.ran.gen is not specified, then PTBdefault will be used, with q=ptb.s, for each variable in the input data.

Note "sensitivity" was originally called "perturb". The name was changed to avoid a conflict with another module, introduced afterwards.

Author(s)

Micah Altman Micah_Altman@harvard.edu http://www.hmdc.harvard.edu/micah_altman/

References

Altman, M., J. Gill and M. P. McDonald. 2003. Numerical Issues in Statistical Computing for the Social Scientist. John Wiley & Sons. http://www.hmdc.harvard.edu/numerical_issues/

See Also

See Also as PTBi, sensitivityZelig, PTBdefault, PTBdiscrete

Examples


# Examine the sensitivity of the GLM from Venables & Ripley (2002, p.189)
# as described in the glm module.
# 
# Perturb the two independent variables using +/- 0.025
#       (relative to the size of each observations)
# uniformly distributed noise. Dependent variable is not being modified.
# 
# Summary should show that estimated coefficients are not substantively affected by noise.

if (!is.R()){
  # workaround MASS data bug in SPLUS
  require(MASS,quietly=TRUE)
}

data(anorexia,package="MASS")
panorexia = sensitivity(anorexia, glm, Postwt ~ Prewt + Treat + offset(Prewt),
     family=gaussian, 
    ptb.R=100, ptb.ran.gen=list(PTBi,PTBus,PTBus), ptb.s=c(1,.005,.005) )
summary(panorexia)

# Use classic longley dataset. The model is numerically unstable, 
# and much more sensitive to noise.  Smaller amounts of noise tremendously
# alter some of the estimated coefficients:
# 
# In this example we are not perturbing the dependent variable (employed) or 
# the year variable. So we assign then PTBi or NULL in ptb.ran.gen )

data(longley)
plongley = sensitivity(longley,lm,Employed~.) # defaults

# Alternatively, choose specific perturbation functions
#
plongley2 = sensitivity(longley,lm,Employed~., ptb.R=100, 
    ptb.ran.gen=c(list(PTBi), replicate(5,PTBus,simplify=FALSE),list(PTBi)), ptb.s=c(1,replicate(5,.001),1))

# summarizes range
sp=summary(plongley)
print(sp)
plot(sp) # plots boxplots of the distribution of the coefficients under perturbatione

# models with anova methods can also be summarized this way
anova(plongley) 

## Not run: 
# plots different replications
plot(plongley) # plots the perturbed replications individually, pausing in between

# plots anova results (where applicable)
plot(anova(plongley))
## End(Not run)

# look in summary object to extract more ...
names(attributes(sp))

# print matrix of coefficients from all runs
coef= attr(sp,"coef.betas.m")
summary(coef)



# Example where model does not accept a dataset as an argument...

# MLE does not accept a dataset as an argument, so need to
# create a wrapper function
#
# 
if (is.R()) {
     library(stats4)
     mleD<-function(data,lld,...) {
           # construct LL function with embedded data
           f=formals(lld)
           f[1]=NULL
           ll <-function()  {
              cl=as.list(match.call())
              cl[1]=NULL
              cl$data=as.name("data")
              do.call(lld,cl)
           }
           formals(ll)=f

           # call mle
           mle(ll,...)
     }

     dat=as.data.frame(cbind( 0:10 , c(26, 17, 13, 12, 20, 5, 9, 8, 5, 4, 8) ))
                                                    
     llD<-function(data, ymax=15, xhalf=6)
         -sum(stats::dpois(data[[2]], lambda=ymax/(1+data[[1]]/xhalf), log=TRUE))
     
     
     print(summary(sensitivity(dat, mleD,llD)))
     
}

# An example of using correlated noise by supplying a matrix noise function
# Note that the function can be an anonymous function supplied in the call itself

  
  if (require(MASS,quietly=TRUE)) {
  

    plongleym=sensitivity(longley,lm,Employed~.,
      ptb.rangen.ismatrix=TRUE,
      ptb.ran.gen=
      function(x,size=1){
             mvrnorm(n=dim(x)[1],mu=rep(0,dim(x)[1]),
                     Sigma=matrix(.9,nrow=dim(x)[1],ncol=dim(x)[1]))*size+x}
    )
    print(summary(plongleym))
  }
  
 


[Package accuracy version 1.31 Index]