TermDocMatrix {tm} | R Documentation |
Constructs a term-document matrix.
## S4 method for signature 'TextDocCol': TermDocMatrix(object, weighting = "tf", stemming = FALSE, minWordLength = 3, minDocFreq = 1, stopwords = NULL, dictionary = NULL)
object |
a text document collection |
weighting |
the weighting mode for the term-document
matrix. Possible settings are
|
stemming |
if set, stems words before making the term-document matrix. |
minWordLength |
words smaller than this number are discarded for the term-document matrix. |
minDocFreq |
words that appear less often in documents than this number are discarded for the term-document matrix. |
stopwords |
either a plain text file with all stopwords or a Boolean value. In the latter case the default stopwords in accordance with the documents' language are used. |
dictionary |
a character vector holding terms to be used as the
columns for the term-document matrix. No other terms from
object will be counted. |
An S4 object of class TermDocMatrix
containing a sparse term-document
matrix. The following slots contain useful information:
Data |
The sparse Matrix |
Weighting |
The weighting mode applied to the term-document matrix |
Ingo Feinerer
data("crude") (tdm <- TermDocMatrix(crude, weighting = "tf-idf", stopwords = TRUE))