read.spectrum {UCS} | R Documentation |
Read a word frequency spectrum from a .spc
file
in lexstats
format (see Baayen, 2001). Returns spectrum as
integer vector, possibly including zeroes, whose m-th element
gives the number of types V_m with frequency rank m.
Also computes sample size N and vocabulary size V.
read.spectrum(file, m.max=Inf, expected=FALSE)
file |
a character string giving the name of a frequency spectrum
file in lexstats format (usually with the extension
.spc ) |
m.max |
maximum length of frequency spectrum, i.e. frequency
ranks m > m.max are discarded. Setting
m.max is a good idea if there are high-frequency types, so
that the spectrum is sparse. For most applications, only the first
10 to 100 ranks are of interest. |
expected |
if TRUE , reads expected class sizes (in the
EVm column) rather than the observed ones (in the Vm
column). This is only possible when the .spc file was
generated by a LNRE model, of course. |
A list with the following components:
spc |
an integer vector containing the class sizes V_m |
N |
the sample size computed from the spectrum |
V |
the vocabulary size computed from the spectrum |
Baayen, R. Harald (2001). Word Frequency Distributions. Kluwer, Dordrecht.