VISL - Treebanks

The graphical format:^

VISL's java-based tree-visualiser will represent syntactic trees in an interactive interface, allowing both step-by-step inspection and manipulation ("rebuilding" and "retagging") of the tree. Each constituent is represented as a node containing both form and function information (e.g. Od:pron = a direct object, which is a pronoun). Trees can be manipulated in 3 ways:

inspection: The tree is shown in its entirety, or layered top-down by clicking on the node to be expanded.

tree-building: Words and non-terminals can be moved with the mous-pointer (drag-&-drop), "mounting" dependents onto heads. E.g. 'the' and 'little' onto the head 'pige''girl', in the little girl). Alternatively, a mother-node can be assembled by clicking all its daughters and using the combine node-function.

labelling: Words and non-terminal nodes in the finished tree can be labelled with category symbols from the category bar (word class, syntactic function, phrase and clause type).

In evaluation-mode the java-program will keep count with your error rate, as well as provide hints and explanations along the way.

The source format:^

Internally, for format filtering, searching and manual revision, trees are stored in the VISL-format, as "horizontal" trees with a separate line for each terminal or non-terminal node, with indentation marking depth. A constituent's daughters will thus be listed below the mother node, with an indentation level increased by 1. Cf. the example from "The World of Sophie" below:

STA:fcl

fA:fcl =SUB:conj-s("når") Når =S:np ==DN:prop("Sofie" GEN) Sofies ==H:n("mor" UTR S IDF NOM) mor =P:v-fin("være" IMPF AKT) var =Cs:adjp ==H:adj("sur" UTR S IDF NOM) sur ==DA:pp ===H:prp("over") over ===DP:pron-indef("en_eller_anden" NEU S NOM) et_eller_andet P:v-fin("ske" IMPF AKT) skete Sf:pron-pers("den" NEU 3S NOM) det S:fcl =SUB:conj-s("at") at =S:pron-pers("hun" UTR 3S NOM) hun =P:v-fin("kalde" IMPF AKT) kaldte =Od:np ==DN:pron-poss("de" 3P GEN) deres ==H:n("hus" NEU S IDF NOM) hus =Co:pp ==H:prp("for") for ==DP:np ===DN:art("en" NEU S IDF) et ===DN:adj("dårlig" COM nG nN nD NOM) værre ===H:n("menageri" NEU S IDF NOM) menageri

VISL constituent trees

are built from Constraint Grammar parser's flat dependency output using a function-based PSG and VISL's open source psg-compiler. Treebank revision is performed first at CG-level, and again after tree-generation, drawing robustness from the CG-system and depth from the PSG-grammar.

VISL dependency trees^

VISL dependency trees are constructed directly from word based CG input using structural transformation filters based on Prolog (S. Harder) or Perl (E. Bick). In source annotation, the result is ordinary CG enriched with token and head id's. =4:2 or #4->2 on a tag line means that the token in question (number 4) attaches to head token number 2. There are two modules that can be used to add dependency numbering links to CG input.

the Depsplicator), implemented in Prolog by Søren Harder. This program, working with an internal, so-called word/9 structure, also calculates relative dependency link numbers (e.g. +2 for a head word 2 positions to the right), and outputs parameter lists for a choice of graphical formats (VISL dependency and DTAG, shown below).
cg2dep, implemented - along with the TIGER and MALT filters - as a Perl Grammar by Eckhard Bick.

<s_id="sofie-da43">

Når [når] KS @SUB #1->4

Sofies [Sofie] PROP GEN @>N #2->3 mor [mor] N UTR S IDF NOM @SUBJ> #3->4 var [være] V IMPF AKT @FS-ADVL> #4->9 sur [sur] ADJ UTR S IDF NOM @4 over [over] PRP @A< #6->5 et=eller=andet [en=eller=anden] DET NEU S NOM @P< #7->6 $, #8->0 skete [ske] V IMPF AKT @FS-STA #9->0 det [den] PERS NEU 3S NOM @F-9 at [at] KS @SUB #11->13 hun [hun] PERS UTR 3S NOM @SUBJ> #12->13 kaldte [kalde] V IMPF AKT @FS-9 deres [de] PERS 3P GEN @>N #14->15 hus [hus] N NEU S IDF NOM @13 for [for] PRP @13 et [en] ART NEU S IDF @>N #17->19 værre [dårlig] ADJ COM nG nN nD NOM @>N #18->19 menageri [menageri] N NEU S IDF NOM @P< #19->16 $. #20->0 </s>

VISL dependency trees:

DTAG export format:

TIGER exchange format^

: This is the treebank exchange format agreed upon by the Nordic Treebank Network, allowing free data exchange and the use of tools developed by the international TIGER project community. VISL constituent trees can be filtered into TIGER constituent format using the program visl2tiger.pl. In TIGER format, edge labels contain the original syntactic function tags, and the (non-teminal) cat category contains phrase and clause forms (graphical example).

TIGER tree example

TIGER dependency format:^

This format is derived from TIGER constituent trees using a special Perl program, called tiger2dep.pl. In this format, word-terminals are "identified" with their dependency node by using the empty edge label '--'.

MALT dependency format:^

This format was developed by Joakim Nivre at Växjö University. For evaluation purposes and compatibility, VISL data can be transformed into MALT, using either visldep2malt (from CG dependency format) or visltiger2malt (from VISL-tree format).

Transformation Tools:^

The table below provides an overview of format transformation programs and filters. The pipe symbol '|' means that the transformation may be achieved by chaining a number of step-by-step programs. Red tools are Perl based (Eckhard Bick), blue ones are Prolog based (Søren Harder). NTN-tools are available through the Nordic Treebank Network. cg2visl (green) is not one program, but a suite of language dependent phrase structure grammars and the VISL's open source C++ rule compiler.

CG CG-dep VISL VISL-dep TIGER TIGER-dep MALT-dep DTAG-dep

CG
cg2dep
depsplicator cg2visl
(visl-psg + grammar) depsplicator cg2visl | visl2tiger.pl cg2visl | visl2tiger.pl
| tiger2dep.pl cg2dep | visldep2malt depsplicator

CG-dep

visldep2malt

VISL tree2cg

visl2tiger.pl visl2tiger.pl | tiger2dep.pl visl2tiger.pl | tiger2dep.pl
| tigerdep2malt

VISL-dep

TIGER

tiger2dep.pl

TIGER-dep

tigerdep2malt, (NTN tools) (NTN tools)

MALT

(NTN tools)

DTAG

(NTN tools)

	CG	CG-dep	VISL	VISL-dep	TIGER	TIGER-dep	MALT-dep	DTAG-dep
CG		cg2dep depsplicator	cg2visl (visl-psg + grammar)	depsplicator	cg2visl \| visl2tiger.pl	cg2visl \| visl2tiger.pl \| tiger2dep.pl	cg2dep \| visldep2malt	depsplicator
CG-dep							visldep2malt
VISL	tree2cg				visl2tiger.pl	visl2tiger.pl \| tiger2dep.pl	visl2tiger.pl \| tiger2dep.pl \| tigerdep2malt
VISL-dep
TIGER						tiger2dep.pl
TIGER-dep							tigerdep2malt, (NTN tools)	(NTN tools)
MALT						(NTN tools)
DTAG						(NTN tools)

Treebanks

Treebank formats: