BNCcomparison {corpora} | R Documentation |
This data set compares the frequencies of 60 selected nouns in the written and spoken parts of the British National Corpus, World Edition (BNC). Nouns were chosen from three frequency bands, namely the 20 most frequent nouns in the corpus, 20 nouns with approximately 1000 occurrences, and 20 nouns with approximately 100 occurrences.
See Aston & Burnard (1998) for more information about the BNC, or go to http://www.natcorp.ox.ac.uk/.
data(BNCcomparison)
A data set with 61 rows and the following columns:
noun
:written
:spoken
:
In addition to the 60 nouns, the data set contains a column labelled
OTHER
, which represents the total frequency of all other nouns
in the BNC. This value is needed in order to calculate the sample
sizes of the written and spoken part for frequency comparison tests.
Stefan Evert (http://purl.org/stefan.evert)
Aston, Guy and Burnard, Lou (1998). The BNC Handbook. Edinburgh University Press, Edinburgh. See also the BNC homepage at http://www.natcorp.ox.ac.uk/.