multigapweightkernel {stringkernels} | R Documentation |
Compute gap-weight kernels of multiple length at once and pack them in a precomputed kernel.
multigapweightkernel(items, maxlength, kernelarray = NULL, lambda = 0.75, normalized = TRUE, tokenizer = openNLP::tokenize, minlength = 1) ## S4 method for signature 'multigapweight': getkernel(mgw, length, use_dummy = FALSE)
items |
List of input texts |
maxlength |
Maximum match length |
kernelarray |
Optionally supply an array of kernel values. |
lambda |
Gap length penalty factor |
normalized |
Normalize kernel values |
tokenizer |
String tokenizer function. By default, this uses openNLP's tokenize to split
the text into words, but users may specify their own function.
|
minlength |
Minimum match length |
mgw |
multigapweight object returned by multigapweightkernel
|
length |
The desired length parameter for the kernel |
use_dummy |
The flag use_dummy=TRUE can be used to
create a kernel with dummy values (see precomputedkernel )
|
The dynamic programming algorithm used for the gap-weighted kernel works by computing the matching statistics for an incrementally larger match length.
Therefore, computing the kernel value for match length n does not take significantly less computational time than computing all kernel values for n' <= n.
This function computes kernel matrices for multiple lengths in one step.
The getkernel
method retrieves the matrix of the desired length and
creates a kernel object with the precomputed values.
A multigapweight
object that contains the kernel value array (a kernel matrix with
an additional dimension for length) and the kernel parameters.
Martin Kober
martin.kober@gmail.com
library(tm) ## This is necessary to make tm's corpora usable with ## stringkernels' S4 classes. setOldClass(c("VCorpus", "Corpus")) setIs("Corpus", "list") data(crude) m = multigapweightkernel(crude, maxlength=3, minlength=2) k2 = getkernel(m, 2) k3 = getkernel(m, 3) kernelMatrix(k2, crude[1:5]) kernelMatrix(k3, crude[1:5])