worddot {stringkernels} | R Documentation |
This function is analogous to kernlab's stringdot
, using words instead of characters.
worddot(type = c("spectrum", "boundrange", "constant", "exponential"), length = 4, lambda = 1.1, normalized = TRUE, tokenizer = openNLP::tokenize)
type |
Type of kernel to be used. Four types are supported: spectrum Matches of exactly length n.boundrange Matches of all lengths up to n. exponential Matches of all lengths with exponentially decaying weighting.
lambda ^ (-n). constant Matches of all lengths with equal weighting.
|
length |
Length of the substrings (only for spectrum and boundrange kernels) |
lambda |
Weighting factor, must be > 1 (only for exponential kernel) |
normalized |
Normalize word kernel values (default: TRUE )
|
tokenizer |
String tokenizer function. By default, this uses openNLP's tokenize to split
the text into words, but users may specify their own function.
|
This function is identical to the stringdot
function in kernlab,
only that it uses words instead characters as tokens.
An S4 kernel object of class stringkernelEx
.
Martin Kober
martin.kober@gmail.com
s = "The cat was chased by the fat dog" t = "The fat cat bit the dog" wdk = worddot(type="spectrum", length=2, normalized=FALSE) wdk(s,t) wdk(s,s)