|
Constraint
|
 |
Grammar
|
|
|
Constraint Grammar (CG) parsers are at the core of most of VISL's live applications. The Constraint Grammar concept was launched
by Fred Karlsson in the early 90'ies (Karlsson et.al.1995), and CG parsers have since been written for a large variety of languages, routinely achieving F-scores for PoS (word class) of over 99%. A
number of syntactic CG systems have reported F-scores of around 95%. VISL's own Constraint Grammar systems are inspired by Eckhard Bick's PALAVRAS parser for Portuguese (Bick 2000), and use, as a
novelty, subclause function, generalized dependency markers and semantic prototype tags. For most languages, a lexicon based morphological analyzer provides input to the first CG level, while the
output of the last CG-level can be converted into syntactic tree structures by specially designed Phrase Structure Grammars (PSG's), using syntactic functions, not words, as terminals. Other, hybrid
combinations are, however, feasible. Thus, the French system uses PoS information from a probabilistic tagger. |
Constraint Grammar (CG) is a methodological paradigm for Natural Language Parsing (NLP). Linguist-written, context dependent rules are
compiled into a grammar that assigns grammatical tags ("readings") to words or other tokens in running text. Typical tags address lemmatisation (lexeme or base form), inflexion, derivation, syntactic
function, dependency, valency, case roles, semantic type etc. Each rule either adds, removes, selects or replaces a tag or a set of grammatical tags in a given sentence context. Context conditions
can be linked to any tag or tag set of any word anywhere in the sentence, either locally (defined distances) or globally (undefined disances). Context conditions in the same rule may be linked, i.e.
conditioned upon each other, negated or blocked by interfering words or tags. Typical CG's consist of thousands of rules, that are applied set-wise in progressive steps, covering ever more advanced
levels of analysis. Within each level, safe rules are used before heuristic rules, and no rule is allowed to remove the last reading of a given kind, thus providing for a hight degree of
robustness. |
The following is an overview over VISL's different CG systems
Language
|
Parser
|
Lexicon
|
Analyzer
|
Grammar
|
Levels
|
Applications
|
|
DanGram
|
100.000 lexemes, 40.000 names
|
Full
|
8.000 rules
|
morph., syntax, dep., psg, case roles
|
Teaching, corpus annotation, MT, Spell/Grammar checker, QA-systems, NER
|
|
PALAVRAS
|
70.000 lexemes, 15.000 names
|
Full
|
7.500 rules
|
morph., syntax, dep., psg
|
Teaching, corpus annotation, MT, QA-systems, NER
|
|
HIS-PALAVRAS
|
60.000 lexemes
|
Full
|
4.500 rules
|
morph., syntax, dep., psg
|
Teaching, corpus annotation
|
|
EngCG
|
160.000 sem
|
Full (Lingsoft)
|
LS+700 rules
|
morph. / syntax (Lingsoft), subclause, psg
|
Teaching, corpus annotation
|
|
FrAG
|
57.000 lexemes
|
DTT (Schmid & Stein) + analysis
|
1.400 rules
|
morph.-correction, syntax, dep., psg
|
Teaching, corpus annotation
|
|
GerGram
|
25.000 val/sem
|
Full (Lingsoft)
|
LS+1.300 rules
|
morph. (Lingsoft), syntax, dep., psg
|
Teaching, corpus annotation
|
|
EspGram
|
30.000 lexemes
|
Full
|
2.600 rules
|
morph., syntax, dep.
|
Teaching, corpus annotation, MT
|
|
|