readPDF {tm} | R Documentation |
Returns a function which reads in a portable document format (PDF) document extracting both its text and its meta data.
readPDF(...)
... |
Arguments for the generator function. |
Formally this function is a function generator, i.e., it returns a function (which reads in a text document) with a well-defined signature, but can access passed over arguments via lexical scoping. This is especially useful for reader functions for complex data structures which need a lot of configuration options.
Note that this PDF reader needs both the tools pdftotext
and
pdfinfo
installed and accessable on your system.
A function
with the signature elem, language, load, id
:
elem |
A list with the two named elements content
and uri . The first element must hold the document to
be read in, the second element must hold a call to extract this
document. The call is evaluated upon a request for load on demand. |
language |
A character vector giving the text's language. |
load |
A logical value indicating whether the document
corpus should be immediately loaded into memory. |
id |
A character vector representing a unique identification
string for the returned text document. |
The function returns a PlainTextDocument
representing the text
and meta data in content
.
Ingo Feinerer
Use getReaders
to list available reader functions.
f <- system.file("texts", "pdf", "pdfarchiving.pdf", package = "tm") readPDF() pdf <- readPDF()(elem = list(uri = substitute(file(f))), language = "en_US", load = TRUE, id = "id1") meta(pdf)