stopwords {lsa} | R Documentation |
This data sets contain very common lists of words that want to be ignored when
building up a document-term matrix. The stop word lists can be loaded by
calling data(stopwords_en)
, data(stopwords_de)
, or
data(stopwords_nl)
. The objects stopwords_de
, stopwords_en
,
and data(stopwords_nl)
must already exist before
being handed over to textmatrix()
.
The French stopword list has been combined by Haykel Demnati by integrating the lists from rank.nl (www.rank.nl/stopwors/french.html), the one from the CLEF team at the University of Neuchatel (http://members.unine.ch/jacques.savoy/clef/frenchST.txt), and the one prepared by Jean Véronis (http://sites.univ-provence.fr/veronis/data/antidico.txt).
data(stopwords_de) data(stopwords_en) data(stopwords_nl) data(stopwords_fr)
A vector containing 424 English, 370 German, 260 Dutch, or 890 French stop words (e.g. 'he', 'she', 'a').
Fridolin Wild fridolin.wild@wu-wien.ac.at, Marco Kalz marco.kalz@ou.nl (for Dutch), Haykel Demnati Haykel.Demnati@isg.rnu.tn (for french)