worddot {stringkernels}R Documentation

Word-based string kernels.

Description

This function is analogous to kernlab's stringdot, using words instead of characters.

Usage

worddot(type = c("spectrum", "boundrange", "constant", "exponential"), 
    length = 4, lambda = 1.1, normalized = TRUE, 
    tokenizer = openNLP::tokenize)

Arguments

type Type of kernel to be used. Four types are supported:
spectrum Matches of exactly length n.
boundrange Matches of all lengths up to n.
exponential Matches of all lengths with exponentially decaying weighting. lambda ^ (-n).
constant Matches of all lengths with equal weighting.
length Length of the substrings (only for spectrum and boundrange kernels)
lambda Weighting factor, must be > 1 (only for exponential kernel)
normalized Normalize word kernel values (default: TRUE)
tokenizer String tokenizer function. By default, this uses openNLP's tokenize to split the text into words, but users may specify their own function.

Details

This function is identical to the stringdot function in kernlab, only that it uses words instead characters as tokens.

Value

An S4 kernel object of class stringkernelEx.

Author(s)

Martin Kober
martin.kober@gmail.com

See Also

stringdot

Examples

s = "The cat was chased by the fat dog"
t = "The fat cat bit the dog"
wdk = worddot(type="spectrum", length=2, normalized=FALSE)
wdk(s,t)
wdk(s,s)


[Package stringkernels version 0.8.8 Index]