stopwords {lsa}R Documentation

Stopwordlists in German, English, Dutch, and French

Description

This data sets contain very common lists of words that want to be ignored when building up a document-term matrix. The stop word lists can be loaded by calling data(stopwords_en), data(stopwords_de), or data(stopwords_nl). The objects stopwords_de, stopwords_en, and data(stopwords_nl) must already exist before being handed over to textmatrix().

The French stopword list has been combined by Haykel Demnati by integrating the lists from rank.nl (www.rank.nl/stopwors/french.html), the one from the CLEF team at the University of Neuchatel (http://members.unine.ch/jacques.savoy/clef/frenchST.txt), and the one prepared by Jean Véronis (http://sites.univ-provence.fr/veronis/data/antidico.txt).

Usage

   data(stopwords_de)
   data(stopwords_en)
   data(stopwords_nl)
   data(stopwords_fr)

Format

A vector containing 424 English, 370 German, 260 Dutch, or 890 French stop words (e.g. 'he', 'she', 'a').

Author(s)

Fridolin Wild fridolin.wild@wu-wien.ac.at, Marco Kalz marco.kalz@ou.nl (for Dutch), Haykel Demnati Haykel.Demnati@isg.rnu.tn (for french)


[Package lsa version 0.63-1 Index]