|
VISL teaching
treebanks (VTB)
Linguistic design
considerations
Linguistic
theory: The default VTB is a constituent treebank. However,
though it can easily be transformed into a classical bracketing
structure like the one use in the PENN treebank, bracketing
(“syntactic form”) is not the main point, and the
formalism retains a strong emphasis on function and
dependency structure, both of which it has in common with
VISL's other important linguistic format, Constraint Grammar.
In ordinary VTB's, dependency is implicitly marked through
head/dependent labels, and export into TIGER-dependency format or is
possible.
Branching:
Multiple (non-binary) branching is allowed. For clarity,
single-daughter nodes (e.g. rewriting single nouns as np's) are
discouraged. In VTB source, branching is expressed as '='
indentation, with each '=' adding an additional layer. By
convention, the top node's daughters are not indented:
STA:fcl S:np =DN:art
The =H:n teacher P:v-fin
laughed
Form and
function: Each node, terminal or non-terminal, is marked for
form and function. The core of a function labels is in upper case,
form labels in lower case. Subcategories can be added in lower case
(e.g. Od for direct object), in the
case of form categories with a hyphen (e.g. pron-pers).
Form and function are combined into a complete node label with a
colon, function first, e.g. Od:np.
Valency:
Clause level constituents may
be marked for ± valency by prefixing a lower case 'f' (free)
or 'b' (bound). fC, for instance, is a
free predicative, as opposed to the default Cs
and Co (subject and object
predicatives, respectively).
Non-terminals:
3 types of non-terminal form
are distinguished: Clause, group/phrase and paratagma, each allowing
a different set of daughter functions - which normally wouldn't be
mixed across node types. Clauses allow clause functions (S,P,O,A,C
and subcategories), groups have heads (H) and dependents (D),
possibly specified according to
group type, as DN (adnominal), DA (adverbial modifier in
group), DP (argument of preposition). A paratagma consists of
conjuncts (CJT) and optional coordinators (CO).
Heads: VISL
extends the “hypotactic” use of group heads to the
catatactic pp, opting for the preposition as (functional) head. The
head of a clause is its verbal constituent, marked as P
(predicator). With this exception, VTB-heads are normally terminals,
though complex heads are allowed, especially in connection with
shared modifiers. Note that in standard notation, the elliptic head
function of a missing np-head can be marked on another candidate
word class. Thus, old in the
old will be head, but retain
its adjective form (H:adj),
and the group will still be an np.
Dependents:
Exploiting the philosophy of
multiple branching, modifiers in a group will usually be handled in
a flat way. The article, determiner and adjective in those few
old oligarchs will all be
daughters of oligarchs
on the same level.
Verb phrases:
A standard VTB complies with
the concept of “little vp”, allowing only verbal
material, infinitive markers and auxiliary particles as daughters.
Specific functions can be used for main verb (Vm), auxiliary/modal
(Vaux) , infinitive marker (INFM) and verb-integrated particles
(Vpart). The latter can either be placed inside the vp, or at clause
level, according to linguistic preference. Head-dependent annotation
can also be used, opting either for a semantic head (main verb) or a
functional head (auxiliary). Clause level functions have been
integrated into the vp in some VTB's (e.g. Spanish enclitic object
pronouns or SUB instead of INFM), but such usage is discouraged and
has so far been avoided by most VTN-designers.
Clauses: VISL
distinguishes between three types of clause form: finite (fcl),
non-finite (icl) and averbal (acl), though under-specification as
just clause (cl) is common in the teaching treebanks. Participle and
infinitive constructions with clause leve constituents (e.g.
objects, subjects, adverbials) will normally be regarded as clauses
(icl) rather than groups - which would be the case in certain
Romance linguistic traditions.
Subordinators:
The function category SUB is ordinarily used for subordinating
conjunctions, while relatives and interrogatives are marked for
their specific clause level SPOAC function rather than their SUB
function. Though both the former and the latter may head averbal
elliptic clauses in a dependency-transformation, they will not be
regarded (functional) heads in ordinary VTB's.
Crossing
branches: VTB's may have
crossing branches, i.e. non-projective dependencies. These are
expressed as discontinous constituents in stardard VTB's, with a
directed hyphen to “join” the individual parts of a
discontinous node, e.g. P:vp- fA -P:vp
for a predicate-vp bracketing a free adverbial (has never seen).
This notation will also handle fronted raised constituents (What
(DP) are you afraid of? That (Od) wasn't easy to guess.) Note
that in a multi-level branching, not only the immediate mother node,
but possible the grand-mother, or even further ancestors, too, will
have to be discontinous (Hvem tror du han holder mest af
at drille?).
Stacking:
There are 2 non-specified
“dummy” categories, 'X' for function, and 'x' for form.
Introduced by C. Bache, the stacking notationmakes use of
these symbols in order to avoid ad hoc categories, and to delegate
labels in elliptic constructions to a level where they can be
resolved:
STA:fcl S:pron-pers He P:v-fin
gave X:par =CJT:x ==Oi:pron-pers
her ==Od:np ===DN:art a ===H:n
horse =CO:conj-c and =CJT:x ==Oi:pron-pers
him ==Od:np ===DN:art a ===H:n car
This
notation will also handle coordinated predicates sharing the same
subject, and is an option in certain cases of ellipsis. For
verb-elliptic clauses, a special form tag, acl
(averbal clause) exists.
|
|