TermDocMatrix {tm}R Documentation

Term-document matrix

Description

Constructs a term-document matrix.

Usage

## S4 method for signature 'TextDocCol':
TermDocMatrix(object, weighting = "tf", stemming
= FALSE, minWordLength = 3, minDocFreq = 1, stopwords = NULL, dictionary
= NULL)

Arguments

object a text document collection
weighting the weighting mode for the term-document matrix. Possible settings are
  • tf Term frequency
  • tf-idf Term frequency inverse document frequency
  • bin Binary frequency
  • logical Similar to binary frequency but with Boolean values
stemming if set, stems words before making the term-document matrix.
minWordLength words smaller than this number are discarded for the term-document matrix.
minDocFreq words that appear less often in documents than this number are discarded for the term-document matrix.
stopwords either a plain text file with all stopwords or a Boolean value. In the latter case the default stopwords in accordance with the documents' language are used.
dictionary a character vector holding terms to be used as the columns for the term-document matrix. No other terms from object will be counted.

Value

An S4 object of class TermDocMatrix containing a sparse term-document matrix. The following slots contain useful information:

Data The sparse Matrix
Weighting The weighting mode applied to the term-document matrix

Author(s)

Ingo Feinerer

Examples

data("crude")
(tdm <- TermDocMatrix(crude, weighting = "tf-idf", stopwords = TRUE))

[Package tm version 0.2-3.7 Index]