vlmc {VLMC} | R Documentation |
Fit a Variable Length Markov Chain (VLMC) to a discrete time series,
in basically two steps:
First a large Markov Chain is generated containing (all if
threshold.gen = 1
) the context states of the time series. In
the second step, many states of the MC are collapsed by pruning
the corresponding context tree.
vlmc(dts, cutoff.prune = qchisq(alpha.c, df=max(.1,alpha.len-1),lower.tail=FALSE)/2, alpha.c = 0.05, threshold.gen = 2, code1char = TRUE, y = TRUE, debug = FALSE, quiet = FALSE, dump = 0, ctl.dump = c(width.ct = 1+log10(n), nmax.set = -1) ) is.vlmc(x) ## S3 method for class 'vlmc': print(x, digits = max(3, getOption("digits") - 3), ...)
dts |
a discrete ``time series''; can be a numeric, character or factor. |
cutoff.prune |
non-negative number; the cutoff used for pruning;
defaults to half the α-quantile of a chisq distribution,
where α = alpha.c , the following argument: |
alpha.c |
number in (0,1) used to specify cutoff.prune in
the more intuitive chi^2 quantile scale; defaulting to 5%. |
threshold.gen |
integer >= 1 (usually left at 2). When
generating the initial large tree, only generate nodes with
count >= threshold.gen . |
code1char |
logical; if true (default), the data dts will
be ..........FIXME........... |
y |
logical; if true (default), the data dts will be
returned. This allows to ensure that residuals
(residuals.vlmc ) and ``k-step ahead'' predictions can
be computed from the result. |
debug |
logical; should debugging info be printed to stderr. |
quiet |
logical; if true, don't print some warnings. |
dump |
integer in 0:2 . If positive, the pruned tree is
dumped to stderr; if 2, the initial unpruned tree is dumped as well. |
ctl.dump |
integer of length 2, say ctl[1:2] controlling
the above dump when dump > 0 . ctl[1] is the width
(number of characters) for the ``counts'', ctl[2] the maximal
number of set elements that are printed per node; when the latter is
not positive (by default), currently max(6, 15 - log10(n)) is used. |
x |
a fitted "vlmc" object. |
digits |
integer giving the number of significant digits for printing numbers. |
... |
potentially further arguments [Generic]. |
A "vlmc"
object, basically a list with components
n |
length of data series when fit. |
threshold.gen, cutoff.prune |
the arguments (or their defaults). |
alpha.len |
the alphabet size. |
alpha |
the alphabet used, as one string. |
size |
a named integer vector of length (>=) 4, giving
characteristic sizes of the fitted VLMC. Its named components are
|
vlmc.vec |
integer vector, containing (an encoding of) the fitted VLMC tree. |
y |
if y = TRUE , the data dts , as
character , using the letters from alpha . |
call |
the call vlmc(..) used. |
Set cutoff = 0, thresh = 1
for getting a ``perfect fit'',
i.e. a VLMC which perfectly re-predicts the data (apart from the first
observation). Note that even with cutoff = 0
some pruning may
happen, for all (terminal) nodes with delta=0.
Martin Maechler
Buhlmann P. and Wyner A. (1998) Variable Length Markov Chains. Annals of Statistics 27, 480–513.
Mächler M. and Bühlmann P. (2004) Variable Length Markov Chains: Methodology, Computing, and Software. J. Computational and Graphical Statistics 2, 435–455.
Mächler M. (2004) VLMC — Implementation and R interface; working paper.
draw.vlmc
,
entropy
, simulate.vlmc
for ``VLMC bootstrapping''.
f1 <- c(1,0,0,0) f2 <- rep(1:0,2) (dt1 <- c(f1,f1,f2,f1,f2,f2,f1)) (vlmc.dt1 <- vlmc(dt1)) vlmc(dt1, dump = 1, ctl.dump = c(wid = 3, nmax = 20), debug = TRUE) (vlmc.dt1c01 <- vlmc(dts = dt1, cutoff.prune = .1, dump=1)) data(presidents) dpres <- cut(presidents, c(0,45,70, 100)) # three values + NA table(dpres <- factor(dpres, exclude = NULL)) # NA as 4th level vlmc.pres <- vlmc(dpres, debug = TRUE) vlmc.pres ## alphabet & and its length: vlmc.pres$alpha stopifnot( length(print(strsplit(vlmc.pres$alpha,NULL)[[1]])) == vlmc.pres$ alpha.len )