Chapter 8. Grammar

Table of Contents

REOPEN-MAPPINGS
CMDARGS, CMDARGS-OVERRIDE
OPTIONS
safe-setparent
addcohort-attach
no-inline-sets
no-inline-templates
strict-wordforms
strict-baseforms
strict-secondary
strict-regex
strict-icase
self-no-barrier
INCLUDE
Sections
BEFORE-SECTIONS
SECTION
AFTER-SECTIONS
NULL-SECTION
Ordering of sections in grammar
--sections with ranges

REOPEN-MAPPINGS

A list of mapping tags that ADD/MAP/REPLACE should be able to operate on even though they were present on readings in the input stream.

      REOPEN-MAPPINGS = @a @b @c ;
    

CMDARGS, CMDARGS-OVERRIDE

You can set default cmdline arguments with CMDARGS += ... ;. Currently arguments can only be added, hence +=, but removing and assignment can be implemented if needed.

Similarly, CMDARGS-OVERRIDE += ... ; will set cmdline arguments that will override the ones actually passed on the command line.

The order of argument sources is well defined.

      CMDARGS += --num-windows 5 ;
    

OPTIONS

You can affect how the grammar should be parsed with OPTIONS += ... ;. Currently options can only be added, hence +=, but removing and assignment can be implemented if needed.

      OPTIONS += no-inline-templates ;
    

safe-setparent

Adds rule flag SAFE to all SETPARENT rules, meaning they won't run on cohorts that already have a parent. Can be countered per-rule with flag UNSAFE.

addcohort-attach

Causes ADDCOHORT to set the dependency parent of the newly added cohort to the target cohort.

no-inline-sets

Disallows the use of inline sets in most places. They're still allowed in places that CG-2 did not consider sets, such as MAP, ADD, REPLACE, and ADDCOHORT tag lists, and in the context of a SET definition. Also, the special set (*) remains valid.

no-inline-templates

Disallows the use of inline templates in most places. They're still allowed in the context of a TEMPLATE definition.

strict-wordforms

Instructs STRICT-TAGS to forbid all wordform tags ("<…>") by default.

strict-baseforms

Instructs STRICT-TAGS to forbid all baseform tags ("…") by default.

strict-secondary

Instructs STRICT-TAGS to forbid all secondary tags (<…>) by default.

strict-regex

Instructs STRICT-TAGS to forbid all regular expression tags (/…/r and others) by default.

strict-icase

Instructs STRICT-TAGS to forbid all case-insensitive tags by default.

self-no-barrier

Inverts the behavior of barriers on self tests. By default, barriers will stop if the self cohort matches. This can be toggled on a per context basis with modifier N, where self-no-barrier inverts the behavior of S vs. SN.

INCLUDE

INCLUDE loads and parses another grammar file as if it had been pasted in on the line of the INCLUDE statement, with the exception that line numbers start again from 1. Included rules can thus conflict with rules in other files if they happen to occupy the same line in multiple files. It will still work as you expect, but --trace output won't show you which file the rules come from.

        INCLUDE other-file-name ;
      

The file name should not be quoted and the line must end with semi-colon. On Posix platforms the path will be shell expanded if it contains any of ~ $ *. The include candidate will be looked for at a path relative to the file performing the include. Be careful not to make circular includes as they will loop forever.

If you use option STATIC, only the passive parts of the grammar is loaded. This is useful if you have an existing grammar that you don't want to split, but still want to reuse the sets from it for other grammars. This is transitive - all grammars loaded from a static grammar will be static, even if not explicitly loaded as such.

        INCLUDE STATIC other-file-name ;
      

Sections

CG-2 has three seperate grammar sections: SETS, MAPPINGS, and CONSTRAINTS. VISLCG added to these with the CORRECTIONS section. Each of these can only contain certain definitions, such as LIST, MAP, or SELECT. As I understand it, this was due to the original CG parser being written in a language that needed such a format. In any case, I did not see the logic or usability in such a strict format. VISL CG-3 has a single section header SECTION, which can contain any of the set or rule definitions. Sections can also be given a name for easier identification and anchor behavior, but that is optional. The older section headings are still valid and will work as expected, though.

By allowing any set or rule definition anywhere you could write a grammar such as:

        DELIMITERS = "<$.>" ;
        LIST ThisIsASet = "<sometag>" "<othertag>" ;

        SECTION
        LIST ThisIsAlsoASet = atag btag ctag ;
        SET Hubba = ThisIsASet - (ctag) ;
        SELECT ThisIsASet IF (-1 (dtag)) ;

        SECTION with-name;
        LIST AnotherSet =  "<youknowthedrill>" ;
        MAP (@bingo) TARGET AnotherSet ;
      

Notice that the first LIST ThisIsASet is outside a section. This is because sets are considered global regardless of where they are declared and can as such be declared anywhere, even before the DELIMITERS declaration should you so desire. A side effect of this is that set names must be unique across the entire grammar, but as this is also the behavior of CG-2 and VISLCG that should not be a surprise nor problem.

Rules are applied in the order they are declared. In the above example that would execute SELECT first and then the MAP rule.

Sections may optionally have rule options (flags) which will be inherited by all rules within that section. Each new section resets this list. In order to parse the section name from rule options, the list of rule options must come after a : with space before it.

BEFORE-SECTIONS

See BEFORE-SECTIONS. Takes the place of what previously were the MAPPINGS and CORRECTIONS blocks, but may contain any rule type.

SECTION

See SECTION. Takes the place of what previously were the CONSTRAINTS blocks, but may contain any rule type.

AFTER-SECTIONS

See AFTER-SECTIONS. May contain any rule type, and is run once after all other sections. This is new in CG-3.

NULL-SECTION

See NULL-SECTION. May contain any rule type, but is not actually run. This is new in CG-3.

Ordering of sections in grammar

The order and arrangement of BEFORE-SECTIONS and AFTER-SECTIONS in the grammar has no impact on the order normal SECTIONs are applied in.

An order of

          SECTION
          SECTION
          BEFORE-SECTIONS
          SECTION
          NULL-SECTION
          AFTER-SECTIONS
          SECTION
          BEFORE-SECTIONS
          SECTION
        

is equivalent to

          BEFORE-SECTIONS
          SECTION
          SECTION
          SECTION
          SECTION
          SECTION
          AFTER-SECTIONS
          NULL-SECTION
        

--sections with ranges

In VISL CG-3, the --sections flag is able to specify ranges of sections to run, and can even be used to skip sections. If only a single number N is given it behaves as if you had written 1-N.

While it is possible to specify a range such as 1,4-6,3 where the selection of sections is not ascending, the actual application order will be 1, 1:4, 1:4:5, 1:4:5:6, 1:3:4:5:6 - that is, the final step will run section 3 in between 1 and 4. This is due to the ordering of rules being adamantly enforced as ascending only. If you wish to customize the order of rules you will currently have to use JUMP or EXECUTE.

        --sections 6
        --sections 3-6
        --sections 2-5,7-9,13-15