A list of binaries available and their usage information.
If command line arguments come from multiple sources, they are applied in this order, with later values overriding prior: CMDARGS
, environment variable CG3_DEFAULT
, arguments passed on the command line, CMDARGS-OVERRIDE
, environment variable CG3_OVERRIDE
.
vislcg3 is the primary binary. It can run rules, compile grammars, and so on.
Usage: vislcg3 [OPTIONS] Environment variable: CG3_DEFAULT: Sets default cmdline options, which the actual passed options will override. CG3_OVERRIDE: Sets forced cmdline options, which will override any passed option. Options: -h, --help shows this help -?, --? shows this help -V, --version prints copyright and version information --min-binary-revision prints the minimum usable binary grammar revision -g, --grammar specifies the grammar file to use for disambiguation --grammar-out writes the compiled grammar in textual form to a file --grammar-bin writes the compiled grammar in binary form to a file --grammar-only only compiles the grammar; implies --verbose --ordered (will in future allow full ordered matching) -u, --unsafe allows the removal of all readings in a cohort, even the last one -s, --sections number or ranges of sections to run; defaults to all sections --rules number or ranges of rules to run; defaults to all rules --rule a name or number of a single rule to run --nrules a regex for which rule names to parse/run; defaults to all rules --nrules-v a regex for which rule names not to parse/run -d, --debug enables debug output (very noisy) -v, --verbose increases verbosity --quiet squelches warnings (same as -v 0) -2, --vislcg-compat enables compatibility mode for older CG-2 and vislcg grammars -I, --stdin file to read input from instead of stdin -O, --stdout file to print output to instead of stdout -E, --stderr file to print errors to instead of stderr --no-mappings disables all MAP, ADD, and REPLACE rules --no-corrections disables all SUBSTITUTE and APPEND rules --no-before-sections disables all rules in BEFORE-SECTIONS parts --no-sections disables all rules in SECTION parts --no-after-sections disables all rules in AFTER-SECTIONS parts -t, --trace prints debug output alongside normal output; optionally stops execution --trace-name-only if a rule is named, omit the line number; implies --trace --trace-no-removed does not print removed readings; implies --trace --trace-encl traces which enclosure pass is currently happening; implies --trace --deleted read deleted readings as such, instead of as text --dry-run make no actual changes to the input --single-run runs each section only once; same as --max-runs 1 --max-runs runs each section max N times; defaults to unlimited (0) --profile gathers profiling statistics and code coverage into a SQLite database -p, --prefix sets the mapping prefix; defaults to @ --unicode-tags outputs Unicode code points for things like -> --unique-tags outputs unique tags only once per reading --num-windows number of windows to keep in before/ahead buffers; defaults to 2 --always-span forces scanning tests to always span across window boundaries --soft-limit number of cohorts after which the SOFT-DELIMITERS kick in; defaults to 300 --hard-limit number of cohorts after which the window is forcefully cut; defaults to 500 -T, --text-delimit additional delimit based on non-CG text, ensuring it isn't attached to a cohort; defaults to /(^|\n)</s/r -D, --dep-delimit delimit windows based on dependency instead of DELIMITERS; defaults to 10 --dep-absolute outputs absolute cohort numbers rather than relative ones --dep-original outputs the original input dependency tag even if it is no longer valid --dep-allow-loops allows the creation of circular dependencies --dep-no-crossing prevents the creation of dependencies that would result in crossing branches --no-magic-readings prevents running rules on magic readings -o, --no-pass-origin prevents scanning tests from passing the point of origin --split-mappings keep mapped readings separate in output -e, --show-end-tags allows the <<< tags to appear in output --show-unused-sets prints a list of unused sets and their line numbers; implies --grammar-only --show-tags prints a list of unique used tags; implies --grammar-only --show-tag-hashes prints a list of tags and their hashes as they are parsed during the run --show-set-hashes prints a list of sets and their hashes; implies --grammar-only --dump-ast prints the grammar parse tree; implies --grammar-only -B, --no-break inhibits any extra whitespace in output
cg-conv converts between stream formats. It can currently convert from any of CG, Niceline CG, Apertium, HFST/XFST, and plain text formats, turning them into CG, Niceline CG, Apertium, or plain text formats. By default it tries to auto-detect the input format and convert that to CG. Currently only meant for use in a pipe.
Usage: cg-conv [OPTIONS] Environment variable: CG3_CONV_DEFAULT: Sets default cmdline options, which the actual passed options will override. CG3_CONV_OVERRIDE: Sets forced cmdline options, which will override any passed option. Options: -h, --help shows this help -?, --? shows this help -p, --prefix sets the mapping prefix; defaults to @ -u, --in-auto auto-detect input format (default) -c, --in-cg sets input format to CG -n, --in-niceline sets input format to Niceline CG -a, --in-apertium sets input format to Apertium -f, --in-fst sets input format to HFST/XFST -x, --in-plain sets input format to plain text --add-tags adds minimal analysis to readings (implies -x) -C, --out-cg sets output format to CG (default) -A, --out-apertium sets output format to Apertium -F, --out-fst sets output format to HFST/XFST -M, --out-matxin sets output format to Matxin -N, --out-niceline sets output format to Niceline CG -X, --out-plain sets output format to plain text -W, --wfactor FST weight factor (defaults to 1.0) --wtag FST weight tag prefix (defaults to W) -S, --sub-delim FST sub-reading delimiters (defaults to #) -r, --rtl sets sub-reading direction to RTL (default) -l, --ltr sets sub-reading direction to LTR -o, --ordered tag order matters mode -D, --parse-dep parse dependency (defaults to treating as normal tags) --unicode-tags outputs Unicode code points for things like -> --deleted read deleted readings as such, instead of as text -B, --no-break inhibits any extra whitespace in output
cg-comp is a lighter tool that only compiles grammars to their binary form. It requires grammars to be in Unicode (UTF-8) encoding. Made for the Apertium toolchain.
USAGE: cg-comp grammar_file output_file
cg-proc is a grammar applicator which can handle the Apertium stream format. It works with binary grammars only, hence the need for cg-comp. It requires the input stream to be in Unicode (UTF-8) encoding. Made for the Apertium toolchain.
USAGE: cg-proc [-t] [-s] [-d] [-g] [-r rule] grammar_file [input_file [output_file]] Options: -d: morphological disambiguation (default behaviour) -s: specify number of sections to process -f: set the format of the I/O stream to NUM, where `0' is VISL format, `1' is Apertium format and `2' is Matxin (default: 1) -r: run only the named rule -t: print debug output on stderr -w: enforce surface case on lemma/baseform (to work with -w option of lt-proc) -n: do not print out the word form of each cohort -g: do not surround lexical units in ^$ -1: only output the first analysis if ambiguity remains -z: flush output on the null character -v: version -h: show this help
cg-strictify will parse a grammar and output a candidate STRICT-TAGS line that you can edit and then put into your grammar. Optionally, it can also output the whole grammar and strip superfluous LISTs along the way.
Usage: cg-strictify [OPTIONS] <grammar> Options: -?, --help outputs this help -g, --grammar the grammar to parse; defaults to first non-option argument -o, --output outputs the whole grammar with STRICT-TAGS --strip removes superfluous LISTs from the output grammar; implies -o --secondary adds secondary tags (<...>) to strict list --regex adds regular expression tags (/../r, <..>r, etc) to strict list --icase adds case-insensitive tags to strict list --baseforms adds baseform tags ("...") to strict list --wordforms adds wordform tags ("<...>") to strict list --all same as --strip --secondary --regex --icase --baseforms --wordforms