tokenize {openNLP} | R Documentation |
Tokenizes the input.
tokenize(s, language = "en", model = NULL)
s |
A character vector of texts to be tokenized. |
language |
A character string giving the language of s .
This argument is only used if model is NULL for
selecting a default model.
At the moment, languages en (English), es (Spanish),
de (German) and th (Thai) are supported, provided that
the corresponding openNLP model language packages
(openNLPmodels.en, ...) are available. |
model |
A model. |
If model
is NULL
then a default model for sentence
detection is loaded from the corresponding openNLP models language
package.
A character vector holding the tokenized s
.
Ingo Feinerer
OpenNLP http://opennlp.sourceforge.net/
s <- "This is a sentence." tokenize(s, language = "en") s <- "¿Como se llama usted? El castellano es la lengua española oficial del Estado." tokenize(s, language = "es")