read.spectrum {UCS}R Documentation

Read Frequency Spectrum File (lexstats)

Description

Read a word frequency spectrum from a .spc file in lexstats format (see Baayen, 2001). Returns spectrum as integer vector, possibly including zeroes, whose m-th element gives the number of types V_m with frequency rank m. Also computes sample size N and vocabulary size V.

Usage

read.spectrum(file, m.max=Inf, expected=FALSE)

Arguments

file a character string giving the name of a frequency spectrum file in lexstats format (usually with the extension .spc)
m.max maximum length of frequency spectrum, i.e. frequency ranks m > m.max are discarded. Setting m.max is a good idea if there are high-frequency types, so that the spectrum is sparse. For most applications, only the first 10 to 100 ranks are of interest.
expected if TRUE, reads expected class sizes (in the EVm column) rather than the observed ones (in the Vm column). This is only possible when the .spc file was generated by a LNRE model, of course.

Value

A list with the following components:
spc an integer vector containing the class sizes V_m
N the sample size computed from the spectrum
V the vocabulary size computed from the spectrum

References

Baayen, R. Harald (2001). Word Frequency Distributions. Kluwer, Dordrecht.

See Also

spectrum.plot, zm, fzm


[Package UCS version 0.5 Index]