ECIMCI_profiles {textcat} | R Documentation |
N-gram profile db for 26 languages based on the European Corpus Initiative Multilingual Corpus I.
ECIMCI_profiles
This profile db was built by Johannes Rauch using the ECI/MCI corpus using the default options employed by package textcat, with all text documents encoded in UTF-8.
The category ids used for the db are the respective IETF language tags
(see language in package tau), using the ISO 639-2
Part B language subtags and, for Serbian, the script employed (i.e.,
"scc-Cyrl"
and "scc-Latn"
for Serbian written in
Cyrillic and Latin script, respectively; all other languages in the
profile are always written in Latin script.)