runmean, runmin, runmax & runquantile {caTools} | R Documentation |
A collection of functions to perform moving window (running, rolling window) analysis of vectors
runmean(x, k, alg=c("C", "R", "exact"), endrule=c("NA", "trim", "keep", "constant", "func")) runmin (x, k, alg=c("C", "R"), endrule=c("NA", "trim", "keep", "constant", "func")) runmax (x, k, alg=c("C", "R"), endrule=c("NA", "trim", "keep", "constant", "func")) runmad (x, k, center=runmed(x,k,endrule="keep"), constant=1.4826, endrule=c("NA", "trim", "keep", "constant", "func")) runquantile(x, k, probs, type=7, endrule=c("NA", "trim", "keep", "constant", "func")) EndRule(x, y, k, endrule=c("NA", "trim", "keep", "constant", "func"), Func, ...)
x |
numeric vector of length n |
k |
width of moving window; must be an odd integer between three and n |
endrule |
character string indicating how the values at the beginning
and the end, of the data, should be treated. Only first and last k2
values at both ends are affected, where k2 is the half-bandwidth
k2 = k %/% 2 .
endrule in runmed function which has the
following options: “c("median", "keep", "constant") ” .
|
alg |
an option allowing to choose different algorithms or
implementations, if provided. Default is to use of code written in C.
Option alg="R" will use slower code written in R. Useful for
debugging and allows extensions in the future. |
center |
moving window center used by runmad function defaults
to running median (runmed function). Similar to center
in mad function. |
constant |
scale factor used by runmad , such that for Gaussian
distribution X, mad (X) is the same as sd (X).
Same as constant in mad function. |
probs |
numeric vector of probabilities with values in [0,1] range
used by runquantile . For example Probs=c(0,0.5,1) would be
equivalent to running runmin , runmed and runmax .
Same as probs in quantile function. |
type |
an integer between 1 and 9 selecting one of the nine quantile
algorithms, same as type in quantile function.
Another even more readable description of nine ways to calculate quantiles
can be found at http://mathworld.wolfram.com/Quantile.html. |
y |
numeric vector of length n, which is partially filled output of
one of the run functions. Function EndRule will fill the
remaining beginning and end sections using method chosen by endrule
argument. |
Func |
Function name that EndRule will use in case of
endrule="func" . |
... |
Additional parameters to Func that EndRule will
use in case of endrule="func" . |
Apart from the end values, the result of y = runFUN(x, k) is the same as
“for(j=(1+k2):(n-k2)) y[j]=FUN(x[(j-k2):(j+k2)])
”, where FUN
stands for min, max, mean, mad or quantile functions.
The main incentive to write this set of functions was relative slowness of
majority of moving window functions available in R and its packages. With
exception of runmed
, a running window median function, all
functions listed in "see also" section are slower than very inefficient
“apply(embed(x,k),1,FUN)
” approach. Relative
speeds of above functions are as follow:
runmin
, runmax
, runmean
run at O(n)
runmean(..., alg="exact")
can have worst case speed of
O(n^2) for some small data vectors, but average case is still
close to O(n).
runquantile
and runmad
run at O(n*k)
runmed
- related R function runs at O(n*log(k))
Functions runquantile
and runmad
are using insertion sort to
sort the moving window, but gain speed by remembering results of the previous
sort. Since each time the window is moved, only one point changes, all but one
points in the window are already sorted. Insertion sort can fix that in O(k)
time.
Function runquantile
when run in single probability mode automatically
recognizes probabilities: 0, 1/2, and 1 as special cases and return output
from functions: runmin
, runmed
and runmax
respectively.
All run*
functions are written in C, but runmin
, runmax
and runmean
also have fast R code versions (see argument
alg="R"
). Those were included for debugging purposes, and as a fallback
in hard-to-port situations. See examples.
Function EndRule
applies one of the five methods (see endrule
argument) to process end-points of the input array x
.
In case of runmean(..., alg="exact")
function a special algorithm is
used (see references section) to ensure that round-off errors do not
accumulate. As a result runmean
is more accurate than
filter
(x, rep(1/k,k)) and runmean(..., alg="C")
functions.
All of the functions in this section do not work with infinite numbers
(NA
,NaN
,Inf
,-Inf
) except for
runmean(..., alg="exact")
which omits them.
Functions runmin
, runmax
, runmean
and runmad
return a numeric vector of the same length as x
.
Function runquantile
returns a matrix of size [n x
length(probs)]. In addition x
contains attr
ibute k
with (the 'oddified') k
.
Function runmean(..., alg="exact")
is based by code by Vadim Ogranovich,
which is based on Python code (see last reference), pointed out by Gabor
Grothendieck.
Jarek Tuszynski (SAIC) jaroslaw.w.tuszynski@saic.com
runmad
and runquantile
:
R. Sedgewick (1988): Algorithms. Addison-Wesley (page 99)
runmean
:
Shewchuk, Jonathan Adaptive Precision Floating-Point Arithmetic and Fast
Robust Geometric Predicates,
http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps
Links related to each function:
runmean
- mean
, kernapply
,
filter
, runsum.exact
, decompose
,
stl
,
rollMean
from fSeries library,
rollmean
from zoo library,
subsums
from magic library,
runmin
- min
, rollMin
from
fSeries library
runmax
- max
, rollMax
from
fSeries library, rollmax
from
zoo library
runquantile
- quantile
, runmed
,
smooth
, rollmedian
from
zoo library
runmad
- mad
, rollVar
from
fSeries library
apply
(embed(x,k), 1, FUN)
(fastest), rollFun
from fSeries (slow), running
from gtools
package (extremely slow for this purpose), rapply
from
zoo library, subsums
from
magic library can perform running window operations on data with any
dimensions.
EndRule
- smoothEnds(y,k)
function is similar to
EndRule(x,y,k,endrule="func", median)
# test runmin, runmax and runmed k=15; n=200; x = rnorm(n,sd=30) + abs(seq(n)-n/4) col = c("black", "red", "green", "blue", "magenta", "cyan") plot(x, col=col[1], main = "Moving Window Analysis Functions") lines(runmin(x,k), col=col[2]) lines(runmed(x,k), col=col[3]) lines(runmax(x,k), col=col[4]) legend(0,.9*n, c("data", "runmin", "runmed", "runmax"), col=col, lty=1 ) #test runmean and runquantile y=runquantile(x, k, probs=c(0, 0.5, 1, 0.25, 0.75), endrule="constant") plot(x, col=col[1], main = "Moving Window Quantile") lines(runmean(y[,1],k), col=col[2]) lines(y[,2], col=col[3]) lines(runmean(y[,3],k), col=col[4]) lines(y[,4], col=col[5]) lines(y[,5], col=col[6]) lab = c("data", "runmean(runquantile(0))", "runquantile(0.5)", "runmean(runquantile(1))", "runquantile(.25)", "runquantile(.75)") legend(0,0.9*n, lab, col=col, lty=1 ) #test runmean and runquantile k =25 m=runmed(x, k) y=runmad(x, k, center=m) plot(x, col=col[1], main = "Moving Window Analysis Functions") lines(m , col=col[2]) lines(m-y/2, col=col[3]) lines(m+y/2, col=col[4]) lab = c("data", "runmed", "runmed-runmad/2", "runmed+runmad/2") legend(0,1.8*n, lab, col=col, lty=1 ) # numeric comparison between different algorithms numeric.test = function (n, k) { eps = .Machine$double.eps ^ 0.5 x = rnorm(n,sd=30) + abs(seq(n)-n/4) # numeric comparison : runmean a = runmean(x,k) b = runmean(x,k, alg="R") d = runmean(x,k, alg="exact") e = filter(x, rep(1/k,k)) stopifnot(all(abs(a-b)<eps, na.rm=TRUE)); stopifnot(all(abs(a-d)<eps, na.rm=TRUE)); stopifnot(all(abs(a-e)<eps, na.rm=TRUE)); # numeric comparison : runmin a = runmin(x,k, endrule="trim") b = runmin(x,k, endrule="trim", alg="R") c = apply(embed(x,k), 1, min) stopifnot(all(a==b, na.rm=TRUE)); stopifnot(all(a==c, na.rm=TRUE)); # numeric comparison : runmax a = runmax(x,k, endrule="trim") b = runmax(x,k, endrule="trim", alg="R") c = apply(embed(x,k), 1, max) stopifnot(all(a==b, na.rm=TRUE)); stopifnot(all(a==c, na.rm=TRUE)); # numeric comparison : runmad a = runmad(x,k, endrule="trim") b = apply(embed(x,k), 1, mad) stopifnot(all(a==b, na.rm=TRUE)); # numeric comparison : runquantile a = runquantile(x,k, c(0.3, 0.7), endrule="trim") b = t(apply(embed(x,k), 1, quantile, probs = c(0.3, 0.7))) stopifnot(all(abs(a-b)<eps)); } numeric.test(50, 3) # test different window size vs. vector ... numeric.test(50,15) # ... length combinations numeric.test(50,49) numeric.test(49,49) # speed comparison x=runif(100000); k=991; system.time(runmean(x,k)) system.time(runmean(x,k, alg="R")) system.time(runmean(x,k, alg="exact")) system.time(filter(x, rep(1/k,k), sides=2)) #the fastest alternative I know k=91; system.time(runmad(x,k)) system.time(apply(embed(x,k), 1, mad)) #the fastest alternative I know # numerical comparison of round-off error handling test.runmean = function (x, k) { a = k*runmean(x,k, alg="exact") b = k*runmean(x,k, alg="C") d = k*runmean(x,k, alg="R") e = k*filter(x, rep(1/k,k)) f = k* c(NA, NA, apply(embed(x,k), 1, mean), NA, NA) x = cbind(x, a, b, d, e, f) colnames(x) = c("x","runmean(alg=exact)","runmean(alg=C)", "runmean(alg=R)","filter","apply") return(x) } a = rep( c(1, 10, -10, -1, 0, 0, 0), 3) # nice-behaving array b = rep( c(1, 10^20, -10^20, -1, 0, 0, 0), 3) # round-off error prone array d = rep( c(1, 10^20, 10^40, -10^40, -10^20, -1, 0), 3) test.runmean(a, 5) #all runmean algorithms give the same result test.runmean(b, 5) #runmean(alg=R) gives wrong result test.runmean(d, 5) #only runmean(alg=exact) gives correct result