wordStem {Rstem}R Documentation

Get the common root/stem of words

Description

This function computes the stems of each of the given words in the vector. This reduces a word to its base component, making it easier to compare words like win, winning, winner. See http://snowball.tartarus.org/ for more information about the concept and algorithms for stemming.

Usage

wordStem(words, language = character())

Arguments

words a character vector of words whose stems are to be computed.
language the name of a recognized language for the package. This should either be a single string which is an element in the vector returned by getStemLanguages, or alternatively a character vector of length 3 giving the names of the routines for creating and closing a Snowball SN\_env environment and performing the stem (in that order). See the example below.

Details

This uses Dr. Martin Porter's stemming algorithm and the interface generated by Snowball http://snowball.tartarus.org/.

Value

A character vector with as many elements as there are in the input vector with the corresponding elements being the stem of the word.

Author(s)

Duncan Temple Lang <duncan@wald.ucdavis.edu>

References

See http://snowball.tartarus.org/

Examples


   # Simple example
   # "win"    "win"    "winner"
 wordStem(c("win", "winning", 'winner'))

  # test the supplied vocabulary.
 testWords = readLines(system.file("words", "english", "voc.txt", package = "Rstem"))
 validate = readLines(system.file("words", "english", "output.txt", package = "Rstem"))

## Not run: 
 # Read the test words directly from the snowball site over the Web
 testWords = readLines(url("http://snowball.tartarus.org/english/voc.txt"))
## End(Not run)

 testOut = wordStem(testWords)
 all(validate == testOut)

  # Specify the language from one of the built-in languages.
 testOut = wordStem(testWords, "english")
 all(validate == testOut)

  # To illustrate using the dynamic lookup of symbols that allows one
  # to easily add new languages or create and close environment
  # routines (for example, to manage pools if this were an efficiency
  # issue!)
 testOut = wordStem(testWords, c("testDynCreate", "testDynClose", "testDynStem"))

[Package Rstem version 0.2-0 Index]