labelling: Words and non-terminal nodes in the finished tree can be labelled with category symbols from the category bar (word class, syntactic function, phrase and clause type).
In evaluation-mode the java-program will keep count with your error rate, as well as provide hints and explanations along the way.
The source format: Internally, for format filtering, searching and manual revision, trees are stored in the VISL-format, as "horizontal" trees with a separate line for each terminal or non-terminal node, with indentation marking depth. A constituent's daughters will thus be listed below the mother node, with an indentation level increased by 1. Cf. the example from "The World of Sophie" below:
STA:fcl
fA:fcl
=SUB:conj-s("når") Når
=S:np
==DN:prop("Sofie" GEN) Sofies
==H:n("mor" UTR S IDF NOM) mor
=P:v-fin("være" IMPF AKT) var
=Cs:adjp
==H:adj("sur" UTR S IDF NOM) sur
==DA:pp
===H:prp("over") over
===DP:pron-indef("en_eller_anden" NEU S NOM) et_eller_andet
P:v-fin("ske" IMPF AKT) skete
Sf:pron-pers("den" NEU 3S NOM) det
S:fcl
=SUB:conj-s("at") at
=S:pron-pers("hun" UTR 3S NOM) hun
=P:v-fin("kalde" IMPF AKT) kaldte
=Od:np
==DN:pron-poss("de" 3P GEN) deres
==H:n("hus" NEU S IDF NOM) hus
=Co:pp
==H:prp("for") for
==DP:np
===DN:art("en" NEU S IDF) et
===DN:adj("dårlig" COM nG nN nD NOM) værre
===H:n("menageri" NEU S IDF NOM) menageri
VISL constituent trees are built from Constraint Grammar parser's flat dependency output using a function-based PSG and VISL's open source psg-compiler. Treebank revision is performed first at CG-level, and again after tree-generation, drawing robustness from the CG-system and depth from the PSG-grammar.
VISL dependency trees are constructed directly from word based CG input using structural transformation filters based on Prolog (S. Harder) or Perl (E. Bick). In source annotation, the result is ordinary CG enriched with token and head id's. =4:2 or #4->2 on a tag line means that the token in question (number 4) attaches to head token number 2. There are two modules that can be used to add dependency numbering links to CG input.
- the Depsplicator), implemented in Prolog by Søren Harder. This program, working with an internal, so-called word/9 structure, also calculates relative dependency link numbers (e.g. +2 for a head word 2 positions to the right), and outputs parameter lists for a choice of graphical formats (VISL dependency and DTAG, shown below).
- cg2dep, implemented - along with the TIGER and MALT filters - as a Perl Grammar by Eckhard Bick.
<s_id="sofie-da43">
Når [når] KS @SUB #1->4
Sofies [Sofie] PROP GEN @>N #2->3
mor [mor] N UTR S IDF NOM @SUBJ> #3->4
var [være] V IMPF AKT @FS-ADVL> #4->9
sur [sur] ADJ UTR S IDF NOM @4
over [over] PRP @A< #6->5
et=eller=andet [en=eller=anden] DET NEU S NOM @P< #7->6
$, #8->0
skete [ske] V IMPF AKT @FS-STA #9->0
det [den] PERS NEU 3S NOM @F-9
at [at] KS @SUB #11->13
hun [hun] PERS UTR 3S NOM @SUBJ> #12->13
kaldte [kalde] V IMPF AKT @FS-9
deres [de] PERS 3P GEN @>N #14->15
hus [hus] N NEU S IDF NOM @13
for [for] PRP @13
et [en] ART NEU S IDF @>N #17->19
værre [dårlig] ADJ COM nG nN nD NOM @>N #18->19
menageri [menageri] N NEU S IDF NOM @P< #19->16
$. #20->0
</s>
VISL dependency trees:
DTAG export format:
TIGER exchange format: This is the treebank exchange format agreed upon by the Nordic Treebank Network, allowing free data exchange and the use of tools developed by the international TIGER project community. VISL constituent trees can be filtered into TIGER constituent format using the program visl2tiger.pl. In TIGER format, edge labels contain the original syntactic function tags, and the (non-teminal) cat category contains phrase and clause forms (graphical example).
<s id="s43" ref="sofie-da43" source="Sofie-da" forest="5/7" text="Når Sofies mor var sur over et eller andet, skete det at hun kaldte deres hus for et værre menageri. ">
<graph root="s43_500">
<terminals>
<t id="s43_1" word="Når" lemma="når" pos="conj-s" morph="--" extra="--"/>
<t id="s43_2" word="Sofies" lemma="Sofie" pos="prop" morph="GEN" extra="hum"/>
<t id="s43_3" word="mor" lemma="mor" pos="n" morph="UTR S IDF NOM" extra="--"/>
<t id="s43_4" word="var" lemma="være" pos="v-fin" morph="IMPF AKT" extra="--"/>
<t id="s43_5" word="sur" lemma="sur" pos="adj" morph="UTR S IDF NOM" extra="--"/>
<t id="s43_6" word="over" lemma="over" pos="prp" morph="--" extra="--"/>
<t id="s43_7" word="et_eller_andet" lemma="en_eller_anden" pos="pron-indef" morph="NEU S NOM" extra="--"/>
<t id="s43_8" word="skete" lemma="ske" pos="v-fin" morph="IMPF AKT" extra="--"/>
<t id="s43_9" word="det" lemma="den" pos="pron-pers" morph="NEU 3S NOM" extra="--"/>
<t id="s43_10" word="at" lemma="at" pos="conj-s" morph="--" extra="--"/>
<t id="s43_11" word="hun" lemma="hun" pos="pron-pers" morph="UTR 3S NOM" extra="--"/>
<t id="s43_12" word="kaldte" lemma="kalde" pos="v-fin" morph="IMPF AKT" extra="--"/>
<t id="s43_13" word="deres" lemma="de" pos="pron-poss" morph="--" extra="--"/>
<t id="s43_14" word="hus" lemma="hus" pos="n" morph="NEU S IDF NOM" extra="--"/>
<t id="s43_15" word="for" lemma="for" pos="prp" morph="--" extra="--"/>
<t id="s43_16" word="et" lemma="en" pos="art" morph="NEU S IDF" extra="--"/>
<t id="s43_17" word="værre" lemma="dårlig" pos="adj" morph="COM nG nN nD NOM" extra="--"/>
<t id="s43_18" word="menageri" lemma="menageri" pos="n" morph="NEU S IDF NOM" extra="--"/>
</terminals>
<nonterminals>
<nt id="s43_500" cat="s">
<edge label="STA" idref="s43_501"/>
</nt>
<nt id="s43_501" cat="fcl">
<edge label="fA" idref="s43_502"/>
<edge label="P" idref="s43_8"/>
<edge label="Sf" idref="s43_9"/>
<edge label="S" idref="s43_506"/>
</nt>
<nt id="s43_502" cat="fcl">
<edge label="SUB" idref="s43_1"/>
<edge label="S" idref="s43_503"/>
<edge label="P" idref="s43_4"/>
<edge label="Cs" idref="s43_504"/>
</nt>
<nt id="s43_503" cat="np">
<edge label="DN" idref="s43_2"/>
<edge label="H" idref="s43_3"/>
</nt>
<nt id="s43_504" cat="adjp">
<edge label="H" idref="s43_5"/>
<edge label="DA" idref="s43_505"/>
</nt>
<nt id="s43_505" cat="pp">
<edge label="H" idref="s43_6"/>
<edge label="DP" idref="s43_7"/>
</nt>
<nt id="s43_506" cat="fcl">
<edge label="SUB" idref="s43_10"/>
<edge label="S" idref="s43_11"/>
<edge label="P" idref="s43_12"/>
<edge label="Od" idref="s43_507"/>
<edge label="Co" idref="s43_508"/>
</nt>
<nt id="s43_507" cat="np">
<edge label="DN" idref="s43_13"/>
<edge label="H" idref="s43_14"/>
</nt>
<nt id="s43_508" cat="pp">
<edge label="H" idref="s43_15"/>
<edge label="DP" idref="s43_509"/>
</nt>
<nt id="s43_509" cat="np">
<edge label="DN" idref="s43_16"/>
<edge label="DN" idref="s43_17"/>
<edge label="H" idref="s43_18"/>
</nt>
</nonterminals>
</graph>
</s>
TIGER tree example
TIGER dependency format: This format is derived from TIGER constituent trees using a special Perl program, called tiger2dep.pl. In this format, word-terminals are "identified" with their dependency node by using the empty edge label '--'.
<s id="s43" ref="sofie-da43" source="Sofie-da" forest="5/7" text="Når Sofies mor var sur over et eller andet, skete det at hun kaldte deres hus for et værre menageri. ">
<graph root="s43_500">
<terminals>
<t id="s43_1" word="Når" lemma="når" pos="conj-s" morph="--" extra="--"/>
<t id="s43_2" word="Sofies" lemma="Sofie" pos="prop" morph="GEN" extra="hum"/>
<t id="s43_3" word="mor" lemma="mor" pos="n" morph="UTR S IDF NOM" extra="--"/>
<t id="s43_4" word="var" lemma="være" pos="v-fin" morph="IMPF AKT" extra="--"/>
<t id="s43_5" word="sur" lemma="sur" pos="adj" morph="UTR S IDF NOM" extra="--"/>
<t id="s43_6" word="over" lemma="over" pos="prp" morph="--" extra="--"/>
<t id="s43_7" word="et_eller_andet" lemma="en_eller_anden" pos="pron-indef" morph="NEU S NOM" extra="--"/>
<t id="s43_8" word="skete" lemma="ske" pos="v-fin" morph="IMPF AKT" extra="--"/>
<t id="s43_9" word="det" lemma="den" pos="pron-pers" morph="NEU 3S NOM" extra="--"/>
<t id="s43_10" word="at" lemma="at" pos="conj-s" morph="--" extra="--"/>
<t id="s43_11" word="hun" lemma="hun" pos="pron-pers" morph="UTR 3S NOM" extra="--"/>
<t id="s43_12" word="kaldte" lemma="kalde" pos="v-fin" morph="IMPF AKT" extra="--"/>
<t id="s43_13" word="deres" lemma="de" pos="pron-poss" morph="--" extra="--"/>
<t id="s43_14" word="hus" lemma="hus" pos="n" morph="NEU S IDF NOM" extra="--"/>
<t id="s43_15" word="for" lemma="for" pos="prp" morph="--" extra="--"/>
<t id="s43_16" word="et" lemma="en" pos="art" morph="NEU S IDF" extra="--"/>
<t id="s43_17" word="værre" lemma="dårlig" pos="adj" morph="COM nG nN nD NOM" extra="--"/>
<t id="s43_18" word="menageri" lemma="menageri" pos="n" morph="NEU S IDF NOM" extra="--"/>
</terminals>
<nonterminals>
<nt id="s43_500" cat="s">
<edge label="STA" idref="s43_501"/>
</nt>
<nt id="s43_501" cat="v-fin">
<edge label="fA" idref="s43_502"/>
<edge label="--" idref="s43_8"/>
<edge label="Sf" idref="s43_9"/>
<edge label="S" idref="s43_506"/>
</nt>
<nt id="s43_502" cat="v-fin">
<edge label="SUB" idref="s43_1"/>
<edge label="S" idref="s43_503"/>
<edge label="--" idref="s43_4"/>
<edge label="Cs" idref="s43_504"/>
</nt>
<nt id="s43_503" cat="n">
<edge label="DN" idref="s43_2"/>
<edge label="--" idref="s43_3"/>
</nt>
<nt id="s43_504" cat="adj">
<edge label="--" idref="s43_5"/>
<edge label="DA" idref="s43_505"/>
</nt>
<nt id="s43_505" cat="prp">
<edge label="--" idref="s43_6"/>
<edge label="DP" idref="s43_7"/>
</nt>
<nt id="s43_506" cat="v-fin">
<edge label="SUB" idref="s43_10"/>
<edge label="S" idref="s43_11"/>
<edge label="--" idref="s43_12"/>
<edge label="Od" idref="s43_507"/>
<edge label="Co" idref="s43_508"/>
</nt>
<nt id="s43_507" cat="n">
<edge label="DN" idref="s43_13"/>
<edge label="--" idref="s43_14"/>
</nt>
<nt id="s43_508" cat="prp">
<edge label="--" idref="s43_15"/>
<edge label="DP" idref="s43_509"/>
</nt>
<nt id="s43_509" cat="n">
<edge label="DN" idref="s43_16"/>
<edge label="DN" idref="s43_17"/>
<edge label="--" idref="s43_18"/>
</nt>
</nonterminals>
</graph>
</s>
MALT dependency format: This format was developed by Joakim Nivre at Växjö University. For evaluation purposes and compatibility, VISL data can be transformed into MALT, using either visldep2malt (from CG dependency format) or visltiger2malt (from VISL-tree format).
<sentence id="s43" ref="sofie-da43" source="Sofie-da" forest="5/7" text="Når Sofies mor var sur over et eller andet, skete det at hun kaldte deres hus for et værre menageri. ">
<word id=1 form="Når" lemma="når" pos="conj-s" morph="--" extra="--" deprel="SUB" head="4"/>
<word id=2 form="Sofies" lemma="Sofie" pos="prop" morph="GEN" extra="hum" deprel="DN" head="3"/>
<word id=3 form="mor" lemma="mor" pos="n" morph="UTR S IDF NOM" extra="--" deprel="S" head="4"/>
<word id=4 form="var" lemma="være" pos="v-fin" morph="IMPF AKT" extra="--" deprel="fA" head="8"/>
<word id=5 form="sur" lemma="sur" pos="adj" morph="UTR S IDF NOM" extra="--" deprel="Cs" head="4"/>
<word id=6 form="over" lemma="over" pos="prp" morph="--" extra="--" deprel="DA" head="5"/>
<word id=7 form="et_eller_andet" lemma="en_eller_anden" pos="pron-indef" morph="NEU S NOM" extra="--" deprel="DP" head="6"/>
<word id=8 form="skete" lemma="ske" pos="v-fin" morph="IMPF AKT" extra="--" deprel="STA" head="0"/>
<word id=9 form="det" lemma="den" pos="pron-pers" morph="NEU 3S NOM" extra="--" deprel="Sf" head="8"/>
<word id=10 form="at" lemma="at" pos="conj-s" morph="--" extra="--" deprel="SUB" head="12"/>
<word id=11 form="hun" lemma="hun" pos="pron-pers" morph="UTR 3S NOM" extra="--" deprel="S" head="12"/>
<word id=12 form="kaldte" lemma="kalde" pos="v-fin" morph="IMPF AKT" extra="--" deprel="S" head="8"/>
<word id=13 form="deres" lemma="de" pos="pron-poss" morph="--" extra="--" deprel="DN" head="14"/>
<word id=14 form="hus" lemma="hus" pos="n" morph="NEU S IDF NOM" extra="--" deprel="Od" head="12"/>
<word id=15 form="for" lemma="for" pos="prp" morph="--" extra="--" deprel="Co" head="12"/>
<word id=16 form="et" lemma="en" pos="art" morph="NEU S IDF" extra="--" deprel="DN" head="18"/>
<word id=17 form="værre" lemma="dårlig" pos="adj" morph="COM nG nN nD NOM" extra="--" deprel="DN" head="18"/>
<word id=18 form="menageri" lemma="menageri" pos="n" morph="NEU S IDF NOM" extra="--" deprel="DP" head="15"/>
</sentence>
Transformation Tools: The table below provides an overview of format transformation programs and filters. The pipe symbol '|' means that the transformation may be achieved by chaining a number of step-by-step programs. Red tools are Perl based (Eckhard Bick), blue ones are Prolog based (Søren Harder). NTN-tools are available through the Nordic Treebank Network. cg2visl (green) is not one program, but a suite of language dependent phrase structure grammars and the VISL's open source C++ rule compiler.
|
CG |
CG-dep |
VISL |
VISL-dep |
TIGER |
TIGER-dep |
MALT-dep |
DTAG-dep |
PENN |
CG |
- |
cg2dep (+ grammar) depsplicator |
cg2visl (vislpsg + grammar) OR cg2dep | dep2tree |
depsplicator |
cg2visl | visl2tiger.pl |
cg2visl | visl2tiger.pl | tiger2dep.pl OR cg2dep | visldep2malt | malt2tigerdep |
cg2dep | visldep2malt |
depsplicator |
cg2dep | dep2tree | visl2penn |
CG-dep |
perl -wnpe 's/#.*//' |
- |
dep2tree |
|
dep2tree | visl2tiger.pl |
visldep2malt | malt2tigerdep |
visldep2malt |
via TIGER-dep (NTN) |
dep2tree | visl2penn |
VISL |
tree2cg |
(tree2cg | cg2dep) |
- |
|
visl2tiger.pl |
visl2tiger.pl | tiger2dep.pl |
visl2tiger.pl | tiger2dep.pl | tigerdep2malt OR visl2malt |
via TIGER-dep (NTN) |
visl2penn |
VISL-dep |
|
|
|
- |
|
|
|
|
|
TIGER |
|
|
|
|
- |
tiger2dep.pl |
tiger2dep.pl | tigerdep2malt |
via TIGER-dep (NTN) |
|
TIGER-dep |
|
|
|
|
|
- |
tigerdep2malt, (NTN tools) |
(NTN tools) |
|
MALT |
|
|
|
|
|
(NTN tools) |
- |
|
|
DTAG |
|
|
|
|
|
(NTN tools) |
|
- |
|