readTabular {tm}R Documentation

Read In a Text Document

Description

Returns a function which reads in a text document from a tabular data structure (like a data frame or a list matrix) with knowledge about its internal structure and possible available metadata as specified by so-called mappings.

Usage

readTabular(mappings, ...)

Arguments

mappings a named list of characters. The constructed reader will map each character entry to a slot or meta datum corresponding to the named list entry. Valid names include .Data to access the document's content, any valid slot name, and characters which are mapped to LocalMetaData entries.
... arguments for the generator function.

Details

Formally this function is a function generator, i.e., it returns a function (which reads in a text document) with a well-defined signature, but can access passed over arguments (e.g., the mappings) via lexical scoping.

Value

A function with the signature elem, language, load, id:

elem A list with the two named elements content and uri. The first element must hold the document to be read in, the second element must hold a call to extract this document. The call is evaluated upon a request for load on demand.
load A logical value indicating whether the document corpus should be immediately loaded into memory.
language A character vector giving the text's language.
id A character vector representing a unique identification string for the returned text document.


The function returns a PlainTextDocument representing content.

Author(s)

Ingo Feinerer

See Also

Vignette 'Extensions: How to Handle Custom File Formats'.

Use getReaders to list available reader functions.

Examples

df <- data.frame(contents = c("content 1", "content 2", "content 3"),
                 title    = c("title 1"  , "title 2"  , "title 3"  ),
                 authors  = c("author 1" , "author 2" , "author 3" ),
                 topics   = c("topic 1"  , "topic 2"  , "topic 3"  ),
                 stringsAsFactors = FALSE)
m <- list(.Data = "contents", Heading = "title",
          Author = "authors", Topic = "topics")
myReader <- readTabular(mappings = m)
ds <- DataframeSource(df)
elem <- getElem(stepNext(ds))
(result <- myReader(elem, load = TRUE, language = "en", id = "id1"))
meta(result)

[Package tm version 0.3-4.1 Index]