A list of binaries available and their usage information.
If command line arguments come from multiple sources, they are applied in this order, with later values overriding prior: CMDARGS, environment variable CG3_DEFAULT, arguments passed on the command line, CMDARGS-OVERRIDE, environment variable CG3_OVERRIDE.
vislcg3 is the primary binary. It can run rules, compile grammars, and so on.
Usage: vislcg3 [OPTIONS]
Environment variable:
CG3_DEFAULT: Sets default cmdline options, which the actual passed options will override.
CG3_OVERRIDE: Sets forced cmdline options, which will override any passed option.
Options:
-h, --help shows this help
-?, --? shows this help
-V, --version prints copyright and version information
--min-binary-revision prints the minimum usable binary grammar revision
-g, --grammar specifies the grammar file to use for disambiguation
--grammar-out writes the compiled grammar in textual form to a file
--grammar-bin writes the compiled grammar in binary form to a file
--grammar-only only compiles the grammar; implies --verbose
--ordered (will in future allow full ordered matching)
-u, --unsafe allows the removal of all readings in a cohort, even the last one
-s, --sections number or ranges of sections to run; defaults to all sections
--rules number or ranges of rules to run; defaults to all rules
--rule a name or number of a single rule to run
--nrules a regex for which rule names to parse/run; defaults to all rules
--nrules-v a regex for which rule names not to parse/run
-d, --debug enables debug output (very noisy)
--debug-rules number or ranges of rules to debug
-v, --verbose increases verbosity
--quiet squelches warnings (same as -v 0)
-2, --vislcg-compat enables compatibility mode for older CG-2 and vislcg grammars
-I, --stdin file to read input from instead of stdin
-O, --stdout file to print output to instead of stdout
-E, --stderr file to print errors to instead of stderr
--no-mappings disables all MAP, ADD, and REPLACE rules
--no-corrections disables all SUBSTITUTE and APPEND rules
--no-before-sections disables all rules in BEFORE-SECTIONS parts
--no-sections disables all rules in SECTION parts
--no-after-sections disables all rules in AFTER-SECTIONS parts
-t, --trace prints debug output alongside normal output; optionally stops execution
--trace-name-only if a rule is named, omit the line number; implies --trace
--trace-no-removed does not print removed readings; implies --trace
--trace-encl traces which enclosure pass is currently happening; implies --trace
--deleted read deleted readings as such, instead of as text
--dry-run make no actual changes to the input
--single-run runs each section only once; same as --max-runs 1
--max-runs runs each section max N times; defaults to unlimited (0)
--profile gathers profiling statistics and code coverage into a SQLite database
-p, --prefix sets the mapping prefix; defaults to @
--unicode-tags outputs Unicode code points for things like ->
--unique-tags outputs unique tags only once per reading
--num-windows number of windows to keep in before/ahead buffers; defaults to 2
--always-span forces scanning tests to always span across window boundaries
--soft-limit number of cohorts after which the SOFT-DELIMITERS kick in; defaults to 300
--hard-limit number of cohorts after which the window is forcefully cut; defaults to 500
-T, --text-delimit additional delimit based on non-CG text, ensuring it isn't attached to a cohort; defaults to /(^|\n)</s/r
-D, --dep-delimit delimit windows based on dependency instead of DELIMITERS; defaults to 10
--dep-absolute outputs absolute cohort numbers rather than relative ones
--dep-original outputs the original input dependency tag even if it is no longer valid
--dep-allow-loops allows the creation of circular dependencies
--dep-no-crossing prevents the creation of dependencies that would result in crossing branches
--no-magic-readings prevents running rules on magic readings
-o, --no-pass-origin prevents scanning tests from passing the point of origin
--split-mappings keep mapped readings separate in output
-e, --show-end-tags allows the <<< tags to appear in output
--show-unused-sets prints a list of unused sets and their line numbers; implies --grammar-only
--show-tags prints a list of unique used tags; implies --grammar-only
--show-tag-hashes prints a list of tags and their hashes as they are parsed during the run
--show-set-hashes prints a list of sets and their hashes; implies --grammar-only
--dump-ast prints the grammar parse tree; implies --grammar-only
-B, --no-break inhibits any extra whitespace in output
cg-conv converts between stream formats. It can currently convert from any of CG, Niceline CG, Apertium, HFST/XFST, and plain text formats, turning them into CG, Niceline CG, Apertium, or plain text formats. By default it tries to auto-detect the input format and convert that to CG. Currently only meant for use in a pipe.
Usage: cg-conv [OPTIONS]
Environment variable:
CG3_CONV_DEFAULT: Sets default cmdline options, which the actual passed options will override.
CG3_CONV_OVERRIDE: Sets forced cmdline options, which will override any passed option.
Options:
-h, --help shows this help
-?, --? shows this help
-p, --prefix sets the mapping prefix; defaults to @
-u, --in-auto auto-detect input format (default)
-c, --in-cg sets input format to CG
-n, --in-niceline sets input format to Niceline CG
-a, --in-apertium sets input format to Apertium
-f, --in-fst sets input format to HFST/XFST
-x, --in-plain sets input format to plain text
--add-tags adds minimal analysis to readings (implies -x)
-C, --out-cg sets output format to CG (default)
-A, --out-apertium sets output format to Apertium
-F, --out-fst sets output format to HFST/XFST
-M, --out-matxin sets output format to Matxin
-N, --out-niceline sets output format to Niceline CG
-X, --out-plain sets output format to plain text
-W, --wfactor FST weight factor (defaults to 1.0)
--wtag FST weight tag prefix (defaults to W)
-S, --sub-delim FST sub-reading delimiters (defaults to #)
-r, --rtl sets sub-reading direction to RTL (default)
-l, --ltr sets sub-reading direction to LTR
-o, --ordered tag order matters mode
-D, --parse-dep parse dependency (defaults to treating as normal tags)
--unicode-tags outputs Unicode code points for things like ->
--deleted read deleted readings as such, instead of as text
-B, --no-break inhibits any extra whitespace in output
cg-comp is a lighter tool that only compiles grammars to their binary form. It requires grammars to be in Unicode (UTF-8) encoding. Made for the Apertium toolchain.
USAGE: cg-comp grammar_file output_file
cg-proc is a grammar applicator which can handle the Apertium stream format. It works with binary grammars only, hence the need for cg-comp. It requires the input stream to be in Unicode (UTF-8) encoding. Made for the Apertium toolchain.
USAGE: cg-proc [-t] [-s] [-d] [-g] [-r rule] grammar_file [input_file [output_file]]
Options:
-d: morphological disambiguation (default behaviour)
-s: specify number of sections to process
-f: set the format of the I/O stream to NUM,
where `0' is VISL format, `1' is
Apertium format and `2' is Matxin (default: 1)
-r: run only the named rule
-t: print debug output on stderr
-w: enforce surface case on lemma/baseform
(to work with -w option of lt-proc)
-n: do not print out the word form of each cohort
-g: do not surround lexical units in ^$
-1: only output the first analysis if ambiguity remains
-z: flush output on the null character
-v: version
-h: show this help
cg-strictify will parse a grammar and output a candidate STRICT-TAGS line that you can edit and then put into your grammar. Optionally, it can also output the whole grammar and strip superfluous LISTs along the way.
Usage: cg-strictify [OPTIONS] <grammar>
Options:
-?, --help outputs this help
-g, --grammar the grammar to parse; defaults to first non-option argument
-o, --output outputs the whole grammar with STRICT-TAGS
--strip removes superfluous LISTs from the output grammar; implies -o
--secondary adds secondary tags (<...>) to strict list
--regex adds regular expression tags (/../r, <..>r, etc) to strict list
--icase adds case-insensitive tags to strict list
--baseforms adds baseform tags ("...") to strict list
--wordforms adds wordform tags ("<...>") to strict list
--all same as --strip --secondary --regex --icase --baseforms --wordforms