wordStem {Rstem} | R Documentation |
This function computes the stems of each of the given words in the vector. This reduces a word to its base component, making it easier to compare words like win, winning, winner. See http://snowball.tartarus.org/ for more information about the concept and algorithms for stemming.
wordStem(words, language = character())
words |
a character vector of words whose stems are to be computed. |
language |
the name of a recognized language for the package.
This should either be a single string which is an element in the
vector returned by getStemLanguages , or
alternatively a character vector of length 3
giving the names of the routines for
creating and closing a Snowball SN\_env environment
and performing the stem (in that order).
See the example below.
|
This uses Dr. Martin Porter's stemming algorithm and the interface generated by Snowball http://snowball.tartarus.org/.
A character vector with as many elements as there are in the input vector with the corresponding elements being the stem of the word.
Duncan Temple Lang <duncan@wald.ucdavis.edu>
See http://snowball.tartarus.org/
# Simple example # "win" "win" "winner" wordStem(c("win", "winning", 'winner')) # test the supplied vocabulary. testWords = readLines(system.file("words", "english", "voc.txt", package = "Rstem")) validate = readLines(system.file("words", "english", "output.txt", package = "Rstem")) ## Not run: # Read the test words directly from the snowball site over the Web testWords = readLines(url("http://snowball.tartarus.org/english/voc.txt")) ## End(Not run) testOut = wordStem(testWords) all(validate == testOut) # Specify the language from one of the built-in languages. testOut = wordStem(testWords, "english") all(validate == testOut) # To illustrate using the dynamic lookup of symbols that allows one # to easily add new languages or create and close environment # routines (for example, to manage pools if this were an efficiency # issue!) testOut = wordStem(testWords, c("testDynCreate", "testDynClose", "testDynStem"))