Interface to the boilerpipe Java library by Christian Kohlschutter (http://code.google.com/p/boilerpipe/)


[Up] [Top]

Documentation for package ‘boilerpipeR’ version 1.0

Help Pages

boilerpipeR-package Extract the main content from HTML files
ArticleExtractor A full-text extractor which is tuned towards news articles.
ArticleSentencesExtractor A full-text extractor which is tuned towards extracting sentences from news articles.
boilerpipe Extract the main content from HTML files
CanolaExtractor A full-text extractor trained on a krdwrd Canola.
content Wordpress generated Webpage (retrieved from Quantivity Blog <URL: http://quantivity.wordpress.com>). Content is saved as character and ready to be extracted.
DefaultExtractor A quite generic full-text extractor.
Extractor Generic extraction function which calls boilerpipe extractors.
KeepEverythingExtractor Marks everything as content.
LargestContentExtractor A full-text extractor which extracts the largest text component of a page.
NumWordsRulesExtractor A quite generic full-text extractor solely based upon the number of words per block (the current, the previous and the next block).