corpus_sample {lsa}R Documentation

Corpus Sample (Matrices)

Description

Generate a random sample of a document collection.

Usage

corpus_sample( filelist, samplesize, index.return=FALSE)

Arguments

filelist a vector containing (relative or absolute) filenames.
samplesize the desired number of files to be returned.
index.return if set to TRUE, the position of the sample files in filelist will be returned.

Details

Creates a random sample of the size samplesize of the specified filelist.

Value

x The random sample; a vector with filenames.
x If index.return is set to TRUE, a list is returned; x contains the filenames and ix contains the position of the sample files in the original filelist.

Author(s)

Fridolin Wild fridolin.wild@wu-wien.ac.at

See Also

textmatrix

Examples


# create some files
td = tempfile()
dir.create(td)
write( c("dog", "cat", "mouse"), file=paste(td, "D1", sep="/") )
write( c("hamster", "mouse", "sushi"), file=paste(td, "D2", sep="/") )
write( c("dog", "monster", "monster"), file=paste(td, "D3", sep="/") )

s = corpus_sample(dir(td, full.names=TRUE), 2, index.return=TRUE)
textmatrix(s$x)

# clean up
unlink(td)


[Package lsa version 0.59 Index]