Chapter 16. Sets

Chapter 16. Sets
Prev		Next

Defining Sets

LIST

Defines a new set based on a list of tags, or appends to an existing set. Composite tags in () require that all tags match. LIST cannot perform set operations - all elements of a LIST definition is parsed as literal tags, not other sets.

        LIST setname = tag othertag (mtag htag) ltag ;

        LIST setname += even more tags ;

If the named set for += is of SET-type, then the new tags will be in a set OR'ed onto the existing one. See set manipulation.

Avoid cluttering your grammar with LIST N = N; definitions by using LIST-TAGS or STRICT-TAGS instead.

SET

Defines a new set based on operations between existing sets. To include literal tags or composite tags in operations, define an inline set with ().

        SET setname = someset + someotherset - (tag) ;

Set Operators

Union: OR and |

Equivalent to the mathematical set union ∪ operator.

        LIST a = a b c d ;
        LIST b = c d e f ;

        # Logically yields a set containing tags: a b c d e f
        # Practically a reading must match either set
        SET r = a OR b ;
        SET r = a | b ;

Except: -

Equivalent to the SQL Except operator.

        LIST a = a b c d ;
        LIST b = c d e f ;

        # Logically yields a set containing tags: a b !c !d !e !f
        # Practically a reading must match the first set and must not match the second set
        SET r = a - b ;

Difference: \

Equivalent to the mathematical set complement ∖ operator. The symbol is a normal backslash.

        LIST a = a b c d ;
        LIST b = c d e f ;

        # Logically yields a set containing tags: a b
        SET r = a \ b ;

Symmetric Difference: ∆

Equivalent to the mathematical set symmetric difference ∆ operator. The symbol is the Unicode code point U+2206.

        LIST a = a b c d ;
        LIST b = c d e f ;

        # Logically yields a set containing tags: a b e f
        SET r = a ∆ b ;

Intersection: ∩

Equivalent to the mathematical set intersection ∩ operator. The symbol is the Unicode code point U+2229.

        LIST a = a b c d ;
        LIST b = c d e f ;

        # Logically yields a set containing tags: c d
        SET r = a ∩ b ;

Cartesian Product: +

Equivalent to the mathematical set cartesian product × operator.

        LIST a = a b c d ;
        LIST b = c d e f ;

        # Logically yields a set containing tags: (a c) (b c) c (d c) (a d) (b d) d (a e)
        #                                         (b e) (c e) (d e) (a f) (b f) (c f) (d f)
        # Practically a reading must match both sets
        SET r = a + b ;

Fail-Fast: ^

On its own, this is equivalent to set difference -. But, when followed by other sets it becomes a blocker. In A - B OR C + D either A - B or C + D may suffice for a match. However, in A ^ B OR C + D, if B matches then it blocks the rest and fails the entire set match without considering C or D.

Magic Sets

(*)

A set containing the (*) tag becomes a magic "any" set and will always match. This saves having to declare a dummy set containing all imaginable tags. Useful for testing whether a cohort exists at a position, without needing details about it. Can also be used to match everything except a few tags with the set operator -.

        (*-1 (*) LINK 1* SomeSet)
        SELECT (*) - NotTheseTags ;

_S_DELIMITERS_

The magic set _S_DELIMITERS_ is created from the DELIMITERS definition. This saves having to declare and maintain a seperate set for matching delimiters in tests.

        SET SomeSet = OtherSet OR _S_DELIMITERS_ ;

_S_SOFT_DELIMITERS_

The magic set _S_SOFT_DELIMITERS_ is created from the SOFT-DELIMITERS definition.

        (**1 _S_SOFT_DELIMITERS_ BARRIER BoogieSet)

Magic Set _TARGET_

A magic set containing the single tag (_TARGET_). This set and tag will only match when the currently active cohort is the target of the rule.

Magic Set _MARK_

A magic set containing the single tag (_MARK_). This set and tag will only match when the currently active cohort is the mark set with X, or if no such mark is set it will only match the target of the rule.

Magic Set _ATTACHTO_

A magic set containing the single tag (_ATTACHTO_). This set and tag will only match when the currently active cohort is the mark set with A.

Magic Set _SAME_BASIC_

A magic set containing the single tag (_SAME_BASIC_). This set and tag will only match when the currently active reading has the same basic tags (non-mapping tags) as the target reading.

Set Manipulation

Undefining Sets

UNDEF-SETS lets you undefine/unlink sets so later definitions can reuse the name. This does not delete a set, nor can it alter past uses of a set. Prior uses of a set remain linked to the old set.

        LIST ADV = ADV ;
        LIST VFIN = (V FIN) ;

        UNDEF-SETS = VINF ADV ;
        SET ADV = A OR D OR V ;
        LIST VFIN = VFIN ;

Appending to Sets

LIST with += lets you append tags to an existing LIST or SET. This does not alter past uses of a set. Prior uses of a set remain linked to the old definition.

For LIST-type sets, this creates a new set that is a combination of all tags from the existing set plus all the new tags.

For SET-type sets, the new tags are OR'ed onto the existing set. This can lead to surprising behavior if the existing set is complex.

        LIST VFIN = (V FIN) ;

        LIST VFIN += VFIN ;

Unification

Tag Unification

Each time a rule is run on a reading, the tag that first satisfied the set must be the same as all subsequent matches of the same set in tests.

A set is marked as a tag unification set by prefixing $$ to the name when used in a rule. You can only prefix existing sets; inline sets in the form of $$(tag tags) will not work, but $$Set + $$OtherSet will; that method will make 2 unification sets, though.

The regex tags <.*>r ".*"r "<.*>"r are special and will unify to the same exact tag of that type. This is useful for e.g. mandating that the baseform must be exactly the same in all places.

For example

          LIST ROLE = <human> <anim> <inanim> (<bench> <table>) ;
          SELECT $$ROLE (-1 KC) (-2C $$ROLE) ;

which would logically be the same as

          SELECT (<human>) (-1 KC) (-2C (<human>)) ;
          SELECT (<anim>) (-1 KC) (-2C (<anim>)) ;
          SELECT (<inanim>) (-1 KC) (-2C (<inanim>)) ;
          SELECT (<bench> <table>) (-1 KC) (-2C (<bench> <table>)) ;

Caveat: The exploded form is not identical to the unified form. Unification rules are run as normal rules, meaning once per reading. The exploded form would be run in-order as seperate rules per reading. There may be side effects due to that.

Caveat 2: The behavior of this next rule is undefined:

          SELECT (tag) IF (0 $$UNISET) (-2* $$UNISET) (1** $$UNISET) ;

Since the order of tests is dynamic, the unification of $$UNISET will be initialized with essentially random data, and as such cannot be guaranteed to unify properly. Well defined behavior can be enforced in various ways:

          # Put $$UNISET in the target
          SELECT (tag) + $$UNISET IF (-2* $$UNISET) (1** $$UNISET) ;

          # Only refer to $$UNISET in a single linked chain of tests
          SELECT (tag) IF (0 $$UNISET LINK -2* $$UNISET LINK 1** $$UNISET) ;

          # Use rule option KEEPORDER
          SELECT KEEPORDER (tag) IF (0 $$UNISET) (-2* $$UNISET) (1** $$UNISET) ;

Having the unifier in the target is usually the best way to enforce behavior.

Top-Level Set Unification

Each time a rule is run on a reading, the top-level set that first satisfied the match must be the same as all subsequent matches of the same set in tests.

A set is marked as a top-level set unification set by prefixing && to the name when used in a rule. You can only prefix existing sets; inline sets in the form of &&(tag tags) will not work, but &&Set + &&OtherSet will; that method will make 2 unification sets, though.

For example

          LIST SEM-HUM = <human> <person> <sapien> ;
          LIST SEM-ANIM = <animal> <beast> <draconic> ;
          LIST SEM-INSECT = <insect> <buzzers> ;
          SET SEM-SMARTBUG = SEM-INSECT + (<sapien>) ;
          SET SAME-SEM = SEM-HUM OR SEM-ANIM + SEM-SMARTBUG ; # During unification, OR and + are ignored
          SELECT &&SAME-SEM (-1 KC) (-2C &&SAME-SEM) ;

which would logically be the same as

          SELECT SEM-HUM (-1 KC) (-2C SEM-HUM) ;
          SELECT SEM-ANIM (-1 KC) (-2C SEM-ANIM) ;
          SELECT SEM-SMARTBUG (-1 KC) (-2C SEM-SMARTBUG) ;

Note that the unification only happens on the first level of sets, hence named top-level unification. Note also that the set operators in the prefixed set are ignored during unification.

You can use the same set for different unified matches by prefixing the set name with a number and colon. E.g., &&SAME-SEM is a different match than &&1:SAME-SEM.

The same caveats as for Tag Unification apply.

Prev		Next
Chapter 15. Templates	Home	Chapter 17. Tags