readDOC {tm}R Documentation

Read In a MS Word Document

Description

Returns a function which reads in a Microsoft Word document extracting its text.

Usage

readDOC(AntiwordOptions = "", ...)

Arguments

AntiwordOptions options passed over to antiword.
... arguments for the generator function.

Details

Formally this function is a function generator, i.e., it returns a function (which reads in a text document) with a well-defined signature, but can access passed over arguments (e.g., options to antiword) via lexical scoping.

Note that this MS Word reader needs the tool antiword installed and accessable on your system.

Value

A function with the signature elem, language, load, id:

elem A list with the two named elements content and uri. The first element must hold the document to be read in, the second element must hold a call to extract this document. The call is evaluated upon a request for load on demand.
language A character vector giving the text's language.
load A logical value indicating whether the document corpus should be immediately loaded into memory.
id A character vector representing a unique identification string for the returned text document.


The function returns a PlainTextDocument representing the text in content.

Author(s)

Ingo Feinerer

See Also

Use getReaders to list available reader functions.


[Package tm version 0.3-4.1 Index]