Corpus {tm} | R Documentation |
Constructs a text document collection (corpus).
## S4 method for signature 'Source': Corpus(object, readerControl = list(reader = object@DefaultReader, language = "en_US", load = TRUE), dbControl = list(useDb = FALSE, dbName = "", dbType = "DB1"), ...)
object |
A Source object. |
readerControl |
A list with the named components reader
representing a reading function capable of handling the file format
found in object , language giving the text's language
(preferably in Iso 639-1 format), and
load being a logical value indicating whether the text corpus of
documents should be loaded immediately into memory (load = TRUE ) or loaded when
necessary (load = FALSE ). This allows to minimize memory
demands for large document collections. If object does not
support load on demand the text corpus is automatically loaded,
i.e., this argument is overruled. |
dbControl |
A list with the named components useDb
indicating that database support should be activated, dbName
giving the filename holding the sourced out objects (i.e., the
database), and dbType holding a valid database type as
supported by package filehash. Under activated database
support the tm package tries to keep as few as possible
resources in memory under usage of the database. |
... |
Optional arguments for the reader . |
An S4 object of class Corpus
which extends the class
list
containing a collection of text documents.
Ingo Feinerer
txt <- system.file("texts", "txt", package = "tm") ## Not run: (Corpus(DirSource(txt), readerControl = list(reader = readPlain, language = "en_US", load = TRUE), dbControl = list(useDb = TRUE, dbName = "oviddb", dbType = "DB1"))) ## End(Not run) reut21578 <- system.file("texts", "reut21578", package = "tm") Corpus(DirSource(reut21578), readerControl = list(reader = readReut21578XML, language = "en_US", load = FALSE))