R: Read Frequency Spectrum File (lexstats)

read.spectrum {UCS}

R Documentation

Read Frequency Spectrum File (lexstats)

Description

Read a word frequency spectrum from a .spc file in lexstats format (see Baayen, 2001). Returns spectrum as integer vector, possibly including zeroes, whose m-th element gives the number of types V_m with frequency rank m. Also computes sample size N and vocabulary size V.

Usage

read.spectrum(file, m.max=Inf, expected=FALSE)

Arguments

`file`	a character string giving the name of a frequency spectrum file in `lexstats` format (usually with the extension `.spc`)
`m.max`	maximum length of frequency spectrum, i.e. frequency ranks m > m.max are discarded. Setting `m.max` is a good idea if there are high-frequency types, so that the spectrum is sparse. For most applications, only the first 10 to 100 ranks are of interest.
`expected`	if `TRUE`, reads expected class sizes (in the `EVm` column) rather than the observed ones (in the `Vm` column). This is only possible when the `.spc` file was generated by a LNRE model, of course.

Value

A list with the following components:

spc an integer vector containing the class sizes V_m

N the sample size computed from the spectrum

V the vocabulary size computed from the spectrum

References

Baayen, R. Harald (2001). Word Frequency Distributions. Kluwer, Dordrecht.