runmean, runmin, runmax & runquantile {caTools}R Documentation

Moving Window Analysis of a Vector

Description

A collection of functions to perform moving window (running, rolling window) analysis of vectors

Usage

  runmean(x, k, alg=c("C", "R", "exact"),
         endrule=c("NA", "trim", "keep", "constant", "func"))
  runmin (x, k, alg=c("C", "R"),
         endrule=c("NA", "trim", "keep", "constant", "func"))
  runmax (x, k, alg=c("C", "R"),
         endrule=c("NA", "trim", "keep", "constant", "func"))
  runmad (x, k, center=runmed(x,k,endrule="keep"), constant=1.4826,  
          endrule=c("NA", "trim", "keep", "constant", "func"))
  runquantile(x, k, probs, type=7, 
          endrule=c("NA", "trim", "keep", "constant", "func"))
  EndRule(x, y, k, 
          endrule=c("NA", "trim", "keep", "constant", "func"), Func, ...)

Arguments

x numeric vector of length n
k width of moving window; must be an odd integer between three and n
endrule character string indicating how the values at the beginning and the end, of the data, should be treated. Only first and last k2 values at both ends are affected, where k2 is the half-bandwidth k2 = k %/% 2.
  • "trim" - trim the ends output array length is equal to length(x)-2*k2 (out = out[(k2+1):(n-k2)]). This option mimics output of apply (embed(x,k),1,FUN) and other related functions.
  • "keep" - fill the ends with numbers from x vector (out[1:k2] = x[1:k2])
  • "constant" - fill the ends with first and last calculated value in output array (out[1:k2] = out[k2+1])
  • "NA" - fill the ends with NA's (out[1:k2] = NA)
  • "func" - applies the underlying function to smaller and smaller sections of the array. For example in case of mean: for(i in 1:k2) out[i]=mean(x[1:i]). This option is not optimized and could be very slow for large windows.
Similar to endrule in runmed function which has the following options: “c("median", "keep", "constant")” .
alg an option allowing to choose different algorithms or implementations, if provided. Default is to use of code written in C. Option alg="R" will use slower code written in R. Useful for debugging and allows extensions in the future.
center moving window center used by runmad function defaults to running median (runmed function). Similar to center in mad function.
constant scale factor used by runmad, such that for Gaussian distribution X, mad(X) is the same as sd(X). Same as constant in mad function.
probs numeric vector of probabilities with values in [0,1] range used by runquantile. For example Probs=c(0,0.5,1) would be equivalent to running runmin, runmed and runmax. Same as probs in quantile function.
type an integer between 1 and 9 selecting one of the nine quantile algorithms, same as type in quantile function. Another even more readable description of nine ways to calculate quantiles can be found at http://mathworld.wolfram.com/Quantile.html.
y numeric vector of length n, which is partially filled output of one of the run functions. Function EndRule will fill the remaining beginning and end sections using method chosen by endrule argument.
Func Function name that EndRule will use in case of endrule="func".
... Additional parameters to Func that EndRule will use in case of endrule="func".

Details

Apart from the end values, the result of y = runFUN(x, k) is the same as “for(j=(1+k2):(n-k2)) y[j]=FUN(x[(j-k2):(j+k2)])”, where FUN stands for min, max, mean, mad or quantile functions.

The main incentive to write this set of functions was relative slowness of majority of moving window functions available in R and its packages. With exception of runmed, a running window median function, all functions listed in "see also" section are slower than very inefficient “apply(embed(x,k),1,FUN)” approach. Relative speeds of above functions are as follow:

Functions runquantile and runmad are using insertion sort to sort the moving window, but gain speed by remembering results of the previous sort. Since each time the window is moved, only one point changes, all but one points in the window are already sorted. Insertion sort can fix that in O(k) time.

Function runquantile when run in single probability mode automatically recognizes probabilities: 0, 1/2, and 1 as special cases and return output from functions: runmin, runmed and runmax respectively.

All run* functions are written in C, but runmin, runmax and runmean also have fast R code versions (see argument alg="R"). Those were included for debugging purposes, and as a fallback in hard-to-port situations. See examples.

Function EndRule applies one of the five methods (see endrule argument) to process end-points of the input array x.

In case of runmean(..., alg="exact") function a special algorithm is used (see references section) to ensure that round-off errors do not accumulate. As a result runmean is more accurate than filter(x, rep(1/k,k)) and runmean(..., alg="C") functions.

All of the functions in this section do not work with infinite numbers (NA,NaN,Inf,-Inf) except for runmean(..., alg="exact") which omits them.

Value

Functions runmin, runmax, runmean and runmad return a numeric vector of the same length as x. Function runquantile returns a matrix of size [n x length(probs)]. In addition x contains attribute k with (the 'oddified') k.

Note

Function runmean(..., alg="exact") is based by code by Vadim Ogranovich, which is based on Python code (see last reference), pointed out by Gabor Grothendieck.

Author(s)

Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com

References

See Also

Links related to each function:

Examples

  # test runmin, runmax and runmed
  k=15; n=200;
  x = rnorm(n,sd=30) + abs(seq(n)-n/4)
  col = c("black", "red", "green", "blue", "magenta", "cyan")
  plot(x, col=col[1], main = "Moving Window Analysis Functions")
  lines(runmin(x,k), col=col[2])
  lines(runmed(x,k), col=col[3])
  lines(runmax(x,k), col=col[4])
  legend(0,.9*n, c("data", "runmin", "runmed", "runmax"), col=col, lty=1 )

  #test runmean and runquantile
  y=runquantile(x, k, probs=c(0, 0.5, 1, 0.25, 0.75), endrule="constant")
  plot(x, col=col[1], main = "Moving Window Quantile")
  lines(runmean(y[,1],k), col=col[2])
  lines(y[,2], col=col[3])
  lines(runmean(y[,3],k), col=col[4])
  lines(y[,4], col=col[5])
  lines(y[,5], col=col[6])
  lab = c("data", "runmean(runquantile(0))", "runquantile(0.5)", 
  "runmean(runquantile(1))", "runquantile(.25)", "runquantile(.75)")
  legend(0,0.9*n, lab, col=col, lty=1 )

  #test runmean and runquantile
  k =25
  m=runmed(x, k)
  y=runmad(x, k, center=m)
  plot(x, col=col[1], main = "Moving Window Analysis Functions")
  lines(m    , col=col[2])
  lines(m-y/2, col=col[3])
  lines(m+y/2, col=col[4])
  lab = c("data", "runmed", "runmed-runmad/2", "runmed+runmad/2")
  legend(0,1.8*n, lab, col=col, lty=1 )

  # numeric comparison between different algorithms
  numeric.test = function (n, k) {
    eps = .Machine$double.eps ^ 0.5
    x = rnorm(n,sd=30) + abs(seq(n)-n/4)
    # numeric comparison : runmean
    a = runmean(x,k)
    b = runmean(x,k, alg="R")
    d = runmean(x,k, alg="exact")
    e = filter(x, rep(1/k,k))
    stopifnot(all(abs(a-b)<eps, na.rm=TRUE));
    stopifnot(all(abs(a-d)<eps, na.rm=TRUE));
    stopifnot(all(abs(a-e)<eps, na.rm=TRUE));
    # numeric comparison : runmin
    a = runmin(x,k, endrule="trim")
    b = runmin(x,k, endrule="trim", alg="R")
    c = apply(embed(x,k), 1, min)
    stopifnot(all(a==b, na.rm=TRUE));
    stopifnot(all(a==c, na.rm=TRUE));
    # numeric comparison : runmax
    a = runmax(x,k, endrule="trim")
    b = runmax(x,k, endrule="trim", alg="R")
    c = apply(embed(x,k), 1, max)
    stopifnot(all(a==b, na.rm=TRUE));
    stopifnot(all(a==c, na.rm=TRUE));
    # numeric comparison : runmad
    a = runmad(x,k, endrule="trim")
    b = apply(embed(x,k), 1, mad)
    stopifnot(all(a==b, na.rm=TRUE));
    # numeric comparison : runquantile
    a = runquantile(x,k, c(0.3, 0.7), endrule="trim")
    b = t(apply(embed(x,k), 1, quantile, probs = c(0.3, 0.7)))
    stopifnot(all(abs(a-b)<eps));
  }
  numeric.test(50, 3) # test different window size vs. vector ...
  numeric.test(50,15) # ... length combinations
  numeric.test(50,49)
  numeric.test(49,49)
  
  # speed comparison
  x=runif(100000); k=991;
  system.time(runmean(x,k))
  system.time(runmean(x,k, alg="R"))
  system.time(runmean(x,k, alg="exact"))
  system.time(filter(x, rep(1/k,k), sides=2)) #the fastest alternative I know
  k=91;
  system.time(runmad(x,k))
  system.time(apply(embed(x,k), 1, mad)) #the fastest alternative I know
  
  # numerical comparison of round-off error handling
  test.runmean = function (x, k) {
    a = k*runmean(x,k, alg="exact")  
    b = k*runmean(x,k, alg="C")  
    d = k*runmean(x,k, alg="R")
    e = k*filter(x, rep(1/k,k))
    f = k* c(NA, NA, apply(embed(x,k), 1, mean), NA, NA)
    x = cbind(x, a, b, d, e, f)
    colnames(x) = c("x","runmean(alg=exact)","runmean(alg=C)",
      "runmean(alg=R)","filter","apply")
    return(x)
  }
  a = rep( c(1, 10,    -10,    -1, 0, 0, 0), 3) # nice-behaving array
  b = rep( c(1, 10^20, -10^20, -1, 0, 0, 0), 3) # round-off error prone array
  d = rep( c(1, 10^20, 10^40, -10^40, -10^20, -1,  0), 3) 
  test.runmean(a, 5) #all runmean algorithms give the same result
  test.runmean(b, 5) #runmean(alg=R) gives wrong result
  test.runmean(d, 5) #only runmean(alg=exact) gives correct result

[Package caTools version 1.6 Index]