Sub-readings introduce a bit of hierarchy into readings, letting a reading have a hidden reading attached to it,
which in turn may have another hidden reading, and so on. See the test/T_SubReading_Apertium
and
test/T_SubReading_CG
tests for usage examples.
The Apertium stream format supports sub-readings via the + delimiter for readings. E.g.
^word/aux3<tag>+aux2<tag>+aux1<tag>+main<tag>$
is a cohort with 1 reading which has a three level deep sub-reading. The order of which is the primary reading vs. sub-readings depends on the grammar SUBREADINGS setting:
SUBREADINGS = RTL ; # Default, right-to-left SUBREADINGS = LTR ; # Alternate, left-to-right
In default RTL mode, the above reading has the primary reading "main"
with sub-reading
"aux1"
with sub-reading "aux2"
and finally sub-reading "aux3"
.
In LTR mode, the above reading has the primary reading "aux3"
with sub-reading
"aux2"
with sub-reading "aux1"
and finally sub-reading "main"
.
The CG stream format supports sub-readings via indentation level. E.g.
"<word>" "main" tag "aux1" tag "aux2" tag "aux3" tag
is a cohort with 1 reading which has a three level deep sub-reading. Unlike the Apertium format, the order is
strictly defined by indentation and cannot be changed. The above reading has the primary reading
"main"
with sub-reading "aux1"
with sub-reading "aux2"
and finally sub-reading "aux3"
.
The indentation level is detected on a per-cohort basis. All whitespace counts the same for purpose of determining indentation, so 1 tab is same as 1 space is same as 1 no-break space and so on. Since it is per-cohort, it won't matter if previous cohorts has a different indentation style, so it is safe to mix cohorts from multiple sources.
Working with sub-readings involves 2 new grammar features: Rule Option SUB:N and Contextual Option /N.
Rule option SUB:N tells a rule which sub-reading it should operate on and which it should test as target. The N is an integer in the range -2^31 to 2^31. SUB:0 is the primary reading and same as not specifying SUB. Positive numbers refer to sub-readings starting from the primary and going deeper, while negative numbers start from the last sub-reading and go towards the primary. Thus, SUB:-1 always refers to the deepest sub-reading.
Given the above CG input and the rules
ADD SUB:-1 (mark) (*) ; ADD SUB:1 (twain) (*) ;
the output will be
"<word>" "main" tag "aux1" tag twain "aux2" tag "aux3" tag mark
Note that SUB:N also determines which reading is looked at as target, so it will work for all rule types.
Context option /N tests the N'th sub-reading of the currently active reading, where N follows the same rules as for SUB:N above. The /N must be last in the context position.
If N is * then the test will search the main reading and all sub-readings.
Given the above CG input and the rules
ADD (mark) (*) (0/-1 ("aux3")) ; # matches 3rd sub-reading "aux3" ADD (twain) (*) (0/1 ("aux1")) ; # matches 1st sub-reading "aux1" ADD (writes) (*) (0/1 ("main")) ; # won't match as 1st sub-reading doesn't have tag "main"
the output will be
"<word>" "main" tag mark twain "aux1" tag "aux2" tag "aux3" tag