UCS {UCS} | R Documentation |
UCS/R consists of a set of R libraries related to the visualisation of cooccurrence data and the evaluation of association measures. The current functionaliy includes: evaluation graphs for association measures (in terms of precision and recall), measures for inter-annotator agreement, and two population models for word frequency distributions.
source("/path/to/UCS/System/R/lib/ucs.R") ucs.library()
UCS/R is initialised by source
ing the file
‘ucs.R’ in the ‘lib/’ subdirectory of the UCS/R
directory tree. This will make the UCS/R documentation
available in the R process and provide the ucs.library
command, which is used to load individual UCS/R modules.
Enter ucs.library()
now to display a list of available modules
(see the ucs.library
manpage for details).
Currently, the following modules are available. The listing below also indicates the most important manpages for each module. Throughout the documentation, it is assumed that you are familiar with the UCS/Perl naming conventions and data set file format.
sfunc:
Special Mathematical Functions
Convenience interfaces to the Gamma function (Cgamma
),
the incomplete (and regularized) Gamma function and its inverse
(Igamma
, Rgamma
), the Beta function
(Cbeta
), the incomplete (and regularized) Beta
function and its inverse (Ibeta
, Rbeta
),
and binomial confidence intervals
(binom.conf.interval
).
All these functions are computed from the pgamma
and
pbeta
distributions (and the corresponding quantile
functions) in the standard library of R.
base
: Basic Functions for Loading and Managing
UCS data sets
This module provides functions for loading UCS data set
files (read.ds.gz
), listing annotated association
measures (ds.find.am
, am.key2var
),
ranking by association scores (order.by.am
,
add.ranks
), and computing
precision/recall tables for the evaluation of association measures
(precision.recall
).
The module also includes a listing of all built-in association
measures in the UCS/Perl system, including add-on
packages (builtin.ams
).
plots
: Evaluation Graphs for Association Measures
This module plots precision-, recall-, and precision-by-recall
graphs for the empirical evaluation of association measures (all
combined in a single function, evaluation.plot
).
The graphs are highly configurable, either locally in each function
call or by setting global defaults (ucs.par
).
The evaluation.plot
function supports confidence
intervals, significance tests for result differences, and evaluation
based on random samples (see Evert, 2004, Ch. 5).
A simple text-mode version of the precision/recall-based evaluation
is provided by the evaluation.table
function in the
base
module.
iaa
: Measures of Inter-Annotator Agreement
Computes Cohen's kappa statistic with standard deviation (Fleiss,
Cohen & Everitt, 1969) or confidence interval for proportion of
true agreement (Krenn, Evert & Zinsmeister, 2004) from a
2-by-2 contingency table (see
iaa.kappa
and iaa.pta
)
gam
: Generalised association measures (GAMs)
This module implements extensions of several association measures
to continuous functions on a real-valued coordinate space
(generalised association measures, GAMs). For details and
terminology, please refer to Evert (2004, Sec. 3.3). The functions
in this module compute GAM scores and iso-surfaces in standard or
ebo-coordinates, and can add jitter to a given data set. New GAMs
can easily be added with the register.gam
function.
Relevant help pages are builtin.gams
,
gam.score
, gam.iso
,
gamma.nbest
, add.jitter
,
add.gams
, add.ebo
,
and gam.helpers
.
eo
: Visualise GAMs in the (e,o) plane
This module implements 2-D visualisation of data sets and GAMs by
plotting point clouds and iso-lines in the (e,o) plane (see Evert
2004, Sec. 3.3). The recommended starting point is the
documentation of the eo.setup
function, which
intialises a new (e,o) plot. Other relevant help pages are
eo.par
, eo.points
, eo.iso
,
eo.iso.diff
, eo.legend
and
eo.mark
.
lexstats
: Utilities for lexical statistics
This module contains miscellaneous utility functions for word
frequency distributions, including: an interface to file
formats used by the lexstats
software (Baayen 2001);
a range of common plots; goodness-of-fit evaluation for LNRE
populations models (cf. the zm
and fzm
modules
below). Currently, the most useful functions in this module
are read.spectrum
, spectrum.plot
,
and lnre.goodness.of.fit
.
zm
: The Zipf-Mandelbrot (ZM) Population Model
This module implements a simple population model for word frequency
distributions (Baayen, 2001) based on the Zipf-Mandelbrot law. See
(Evert, 2004a) for details. Relevant help pages are
zm
, EV
, EVm
,
VV
, VVm
,
write.lexstats
, and lnre.goodness.of.fit
.
fzm
: The Finite Zipf-Mandelbrot (fZM)
Population Model
This module implements the finite Zipf-Mandelbrot model, an
extension of the ZM model (Evert, 2004a). Relevant help pages are
fzm
, EV
, EVm
,
VV
, VVm
,
write.lexstats
, and lnre.goodness.of.fit
.
The command help(package=UCS)
will give you a full index of
available UCS/R help pages. Use help.search()
for
full-text search.
The correct source
path for the file ‘ucs.R’ can be set
automatically with the UCS/Perl tool ucs-config
. Simply
insert the statement
source("ucs.R")on a separate line in your R script file (say, ‘my-script.R’) and run the shell command
ucs-config my-script.R
Baayen, R. Harald (2001). Word Frequency Distributions. Kluwer, Dordrecht.
Evert, Stefan (2004). The Statistics of Word Cooccurrences: Word Pairs and Collocations. PhD Thesis, IMS, University of Stuttgart.
Evert, Stefan (2004a). A simple LNRE model for random character sequences. In Proceedings of JADT 2004, Louvain-la-Neuve, Belgium, pages 411–422.
Fleiss, Joseph L.; Cohen, Jacob; Everitt, B. S. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72(5), 323–327.
Krenn, Brigitte; Evert, Stefan; Zinsmeister, Heike (2004). Determining intercoder agreement for a collocation identification task. In preparation.
ucs.library
, the UCS/R tutorial
(‘tutorial.R’ in the ‘script/’ subdirectory) and the
UCS/Perl documentation.