cspade {arulesSequences} | R Documentation |
Mining frequent sequential patterns with the cSPADE algorithm. This algorithm utilizes temporal joins along with efficient lattice search techniques and provides for timing constraints.
cspade(data, parameter = NULL, control = NULL, tmpdir = tempdir())
data |
an object of class
transactions with
temporal information. |
parameter |
an object of class SPparameter
or a named list with corresponding components. |
control |
an object of class SPcontrol
or a named list with corresponding components. |
tmpdir |
a non-empty character vector giving the directory name where temporary files are written. |
Interfaces the command-line tools for preprocessing and mining frequent sequences with the cSPADE algorithm by M. Zaki via a proper chain of system calls.
The temporal information is taken from components sequenceID
(sequence or customer identifier) and eventID
(event identifier)
of slot transactionInfo
. Both identifiers must be in (blockwise)
ascending order.
The amount of disk space used by temporary files is reported in
verbose mode (see class SPcontrol
).
The utility function read_baskets
provides for reading
of text files with temporal transaction data.
Returns an object of class sequences
.
Temporary files may not be deleted until the end of the R session if the call is interrupted.
sequenceID
and eventID
are coerced to factor if necessary.
Christian Buchta, Michael Hahsler
M. J. Zaki. (2001). SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal, 42, 31–60.
Class
transactions
,
sequences
,
SPparameter
,
SPcontrol
,
method
ruleInduction
,
function
read_baskets
.
## use example data from paper data(zaki) ## mine frequent sequences s1 <- cspade(zaki, parameter = list(support = 0.4), control = list(verbose = TRUE)) summary(s1) as(s1, "data.frame") ## use timing constraint s2 <- cspade(zaki, parameter = list(support = 0.4, maxwin = 5)) as(s2, "data.frame") ## replace timestamps t <- zaki transactionInfo(t)$eventID <- unlist(tapply(seq(t), transactionInfo(t)$sequenceID, function(x) x - min(x) + 1), use.names = FALSE) as(t, "data.frame") s0 <- cspade(t, parameter = list(support = 0.4)) s0 identical(as(s1, "data.frame"), as(s0, "data.frame")) ## Not run: ## use generated data t <- read_baskets(con = system.file("misc", "test.txt", package = "arulesSequences"), info = c("sequenceID","eventID","SIZE")) summary(t) ## use low support s3 <- cspade(t, parameter = list(support=0.03), control = list(verbose=TRUE)) summary(s3) ## End(Not run)