Weka_tokenizers {RWeka} | R Documentation |
R interfaces to Weka tokenizers.
AlphabeticTokenizer(x, control = NULL) NGramTokenizer(x, control = NULL) WordTokenizer(x, control = NULL)
x |
a character vector with strings to be tokenized. |
control |
an object of class Weka_control , or a
character vector of control options, or NULL (default).
Available options can be obtained on-line using the Weka Option
Wizard WOW , or the Weka documentation. |
AlphabeticTokenizer
is an alphabetic string tokenizer, where
tokens are to be formed only from contiguous alphabetic sequences.
NGramTokenizer
splits strings into n-grams with given
minimal and maximal numbers of grams.
WordTokenizers
is a simple word tokenizer.
A character vector with the tokenized strings.