readNewsgroup {tm}R Documentation

Read In a Newsgroup Document

Description

Returns a function which reads in a newsgroup document as found in the UCI KDD newsgroup data set.

Usage

readNewsgroup(DateFormat = "%d %B %Y %H:%M:%S", ...)

Arguments

DateFormat the format of the Date header in the newsgroup document.
... arguments for the generator function.

Details

Formally this function is a function generator, i.e., it returns a function (which reads in a newsgroup document) with a well-defined signature, but can access passed over arguments (e.g., to specify the format of the Date header in the newsgroup document via DateFormat) via lexical scoping.

Value

A function with the signature elem, language, load, id:

elem A list with the two named elements content and uri. The first element must hold the document to be read in, the second element must hold a call to extract this document. The call is evaluated upon a request for load on demand.
language A character vector giving the text's language.
load A logical value indicating whether the document corpus should be immediately loaded into memory.
id A character vector representing a unique identification string for the returned text document.


The function returns a NewsgroupDocument representing content.

Author(s)

Ingo Feinerer

See Also

Use getReaders to list available reader functions.

See strptime for date format specifications.


[Package tm version 0.3-4.1 Index]