tokenize {openNLP}R Documentation

Tokenizer

Description

Tokenizes the input.

Usage

tokenize(s, language = "en", model = NULL)

Arguments

s A character vector to be tokenized.
language A character vector giving s's language. This argument is only used if model is NULL for selecting a default model. At the moment only en (English) and es (Spanish) are supported.
model A model.

Details

If model is NULL then a default model for tokenization of English or Spanish texts from the openNLPmodels is loaded.

Value

A character vector holding the tokenized s.

Author(s)

Ingo Feinerer

References

OpenNLP. http://opennlp.sourceforge.net/

Examples

s <- "This is a sentence."
tokenize(s, language = "en")
s <- "¿Como se llama usted? El castellano es la lengua española oficial
del Estado."
tokenize(s, language = "es")

[Package openNLP version 0.0-6 Index]