<<

NAME

UCS - Core library

SYNOPSIS

  use UCS;

  $UCS::Version;                                # UCS version
  $UCS::Copyright;                              # UCS copyright string
  $UCS::BaseDir;                                # base directory of UCS system
  $UCS::PerlDir;                                # base directory of UCS/Perl
  
  UCS::Die("Msg line 1", "Msg line 2", ...);    # really die (even in Tk loop)
  UCS::Warn("Msg line 1", "Msg line 2", ...);   # warning message (may be caught by Tk)
  UCS::Status("Message");                       # display status message in Tk window
  UCS::Splash();                                # splash screen (may be shown during start-up)
  $UCS::Verbose = 0;                            # suppress warnings 
  @unique_values = UCS::Unique(@list);          # remove duplicates from list

  @vars = (@UCS::CoreVars, @UCS::DerivedVars);  # standard variable names (core and derived)
  @matches = UCS::Match($pattern, @names);      # match variable names
  $ok = UCS::ValidKey($key);                    # valid identifier, e.g as AM key
  $ok = UCS::ValidName($name);                  # whether variable name is valid
  $type = UCS::VarType($name);                  # "BOOL", "INT", "DOUBLE", "STRING"
  ($spec, $key) = UCS::SplitName($name);        # split am.*, r.*, or user-defined variable name

  @registered_AMs = UCS::AM_Keys();             # keys for built-in AMs (when loaded)
  if (UCS::AM($key)) {
    $full_name = UCS::AM_Name($key);            # long descriptive name
    $description = UCS::AM_Description($key);   # optional multi-line text
    $exp = UCS::AM_Expression($key);            # AM equation as compiled UCS expression
    $score = $exp->eval({f=>$f, f1=>$f1, ...}); # use UCS::Expression methods to evaluate AM
  }
  $score = UCS::Eval_AM($key, $arghash);        # convenient but slow

  UCS::Load_AM_Package("HTest", ...);           # load built-in AM packages

  $ok = UCS::Register_AM                        # register new association measure
    "tscore",                                   # AM key (-> variables am.tscore and r.tscore)
    "t-score measure (Church et. al. 1991)",    # long descriptive name
    '(%O11% - %E11%) / sqrt(%O11%)',            # UCS expression (will be compiled into UCS::Expression)
    $multiline_text;                            # optional multi-line description of AM

DESCRIPTION

This UCS core library maintains a list of bulit-in AMs and Perl subroutines for computing their scores from a candidate's signatures. Utility functions perform syntax checks for field names, determine field types from the naming conventions, and match patterns containing UCS wildcards against field names.

CONFIGURATION VARIABLES

$UCS::Version;

The currently installed UCS version.

$UCS::Copyright;

A copyright string for the UCS system. Will be displayed by some UCS/Perl scripts.

$UCS::BaseDir;

The base directory of the UCS System installation. Compiled UCS programs and links to Perl scripts are installed in $UCS::BaseDir/bin/, while the components of UCS/R can be found in $UCS::BaseDir/R/.

$UCS::PerlDir;

The base directory of the UCS/Perl installation. The UCS Perl modules are installed in $UCS::PerlDir/lib/ and its subdirectories, Perl scripts in $UCS::PerlDir/bin/.

GENERAL FUNCTIONS

UCS::Die($message, ...);

"Safe" replacement for Perl's built-in die function, which will even exit properly from a Perl/Tk loop. One or more lines of error messages are printed on STDERR (or shown in some other suitable manner).

UCS::Warn($message, ...);

By default, prints one or more lines of warning/error messages on STDERR like UCS::Die, but does not exit the script. The purpose of this replacement for the built-in warn function is to allow warnings to be caught and displayed in a Perl/Tk user interface. Warnings might also be redirected to a log file.

UCS::Status($message);

Displays a status message in a Perl/Tk interface. By default, $message is appended to any previous messages. When $message ends in a newline character (\n), the next call to UCS::Status will replace the current message; when it ends in a carriage return (\r), the next call will overwrite the current message from the start. (This is the usual effect of printing such control characters, and will be simulated in Perl/Tk interfaces).

UCS::Splash();

Displays a UCS splash screen with UCS version information and copyright, e.g. during the start-up phase of a larger UCS/Perl script.

$UCS::Verbose = 0;

The variable $UCS::Verbose controls whether status messages and warnings are printed on STDOUT and STDERR, respectively. Verbose output is enabled by default, and can be suppressed by setting $UCS::Verbose to 0.

@unique_values = UCS::Unique(@list);

Removes duplicate values from @list and returns the remaining elements in the original order. Useful to avoid repretitions of variable names etc.

MANIPULATING VARIABLE NAMES

$std_vars = (@UCS::CoreVars, @UCS::DerivedVars);

Names of core and derived variables.

$ok = UCS::ValidKey($key);

Returns true iff $key is a valid UCS identifier, which may be used as an AM key or in the name of a user-defined variable.

$ok = UCS::ValidName($name);

Returns true iff $name is a valid UCS variable name, i.e. either a standard variable (core or derived) , an association score or ranking, or a user-defined variable. See ucsfile for details on the UCS naming conventions.

$type = UCS::VarType($name);

Determines the data type of a variable from its name $name, according to the UCS naming conventions. Possible data types are BOOL (Boolean, 0/1), INT (signed integer), DOUBLE (double-precision floating-point), and STRING (string value).

($spec, $key) = UCS::SplitName($name);

Splits the variable name $name of an association score, ranking, or user-defined variable into the specifier $spec and the key $key. $spec will be one of am, r, b, f, n, or x. If $name is invalid or the name of a standard variable, (undef, $name) is returned.

@matches = UCS::Match($pattern, @names);

Extract strings from @names that match the UCS wildcard pattern $pattern. The pattern may contain literal characters A-Z a-z 0-9 . and the wildcards ?, *, and %.

  ?  ...  arbitrary character
  *  ...  arbitrary substring without "."
  %  ...  arbitrary string

Thus, the pattern % selects all field names, * selects the names of core and derived fields, am.% all AM scores, etc. See ucsexp for more examples.

ASSOCIATION MEASURE REGISTRY

This registry maintains a list of association measures, which are automatically available to all UCS/Perl scripts. Association measures are identified by their key, which must be a valid UCS identifier. Association scores for a measure with the key fisher, for instance, will be stored in the variable am.fisher, and the corresponding rankings in the variable r.fisher. A wide range of predefined association measures can be imported from the UCS::AM module and several add-on packages (see the UCS::AM manpage).

@registered_AMs = UCS::AM_Keys();

The UCS::AM_Keys function returns the keys of all currently registered association measures as an unordered list. (Note that no association measures are defined unless UCS::AM and/or the add-on packages have been imported.)

$ok = UCS::AM($key);

Returns true if an association measure is registered under $key.

$full_name = UCS::AM_Name($key);

Returns a long and descriptive name for the association measure identified by $key. This name should be suitable for presentation to the user in a selection dialogue.

$description = UCS::AM_Description($key);

An optional lengthy description of the association measure identified by $key. $description is a single string but will usually contain linebreaks (\n), which may need to be removed for automatic justification (e.g. in a Perl/Tk interface).

$exp = UCS::AM_Expression($key);

Returns the equation of the association measure $key, compiled into a UCS::Expression object. Call the eval or evalloop method of $exp to compute association scores (see UCS::Expression). The sourcecode of this expression can be retrieved with the string method (which is especially useful for built-in association measures).

$score = UCS::Eval_AM($key, $arghash);

The UCS::Eval_AM function is a convenient and shorter alternative, and is equivalent to:

  $exp = UCS::AM_Expression($key);
  $score = $exp->eval($arghash);

It incurs considerable overhead when association scores are calculated for multiple pair types (because of the repeated lookup of $key in the AM registry), and should be avoided in tight loops. (See UCS::Expression for some comments on efficiency.)

@packages = UCS::Load_AM_Package($name, ...);

Load one or more of the built-in AM packages as specified by the function arguments. $name must match the last part of the corresponding module name, e.g. 'HTest' to load the UCS::AM::HTest package. $name is case-insensitive and may be abbreviated to a unique prefix. The special name 'ALL' (or 'all') loads all available add-on packages, while the empty string '' loads the basic measures from UCS::AM. UCS::Load_AM_Package returns a list containing the full names of all loaded packages (with duplicates removed). If there is no match for $name, an empty list is returned.

$ok = UCS::Register_AM($key, $name, $equation [, $description]);

The UCS::Register_AM function is used to register a new association measure, or overwrite an existing one with a new definition. $key is the identification key of the new measure, $name a descriptive name, $equation the measure's equation in the form of an (uncompiled) UCS expression, and $description an optional multi-line description. $equation may also be an object of class UCS::Expression (which is cloned rather than re-compiled), enabling the use of advanced features such as parametric expressions.

The function call returns true if the new measure has been successfully registered. A false return value indicates that compilation of $equation into an UCS::Expression object failed. The UCS::Register_AM function will die if $key is not a valid UCS identifier.

The example below shows the code used to register the t-score measure (Church et. al. 1991) which has been widely used in English lexicography.

  $ok = UCS::Register_AM  "tscore",
    "t-score measure (Church et. al. 1991)",
    '(%O11% - %E11%) / sqrt(%O11%)',
    "The t-score measure applies Student's t-test to ...";
  die "Syntax error in UCS expression for t-score measure"
    unless $ok;

SEE ALSO

Type ucsdoc ucsintro for an introduction to UCS/Perl and an overview of its components (in the MODULES and PROGRAMS sections).

COPYRIGHT

Copyright 2003 Stefan Evert.

This software is provided AS IS and the author makes no warranty as to its use and performance. You may use the software, redistribute and modify it under the same terms as Perl itself.

<<