VISL - World of VISL - Constraint Grammar

World of VISL > Constraint Grammar

VISL - Visual Interactive Syntax Learning

Constraint

Grammar

Constraint Grammar (CG) parsers are at the core of most of VISL's live applications. The Constraint Grammar concept was launched by Fred Karlsson in the early 90'ies (Karlsson et.al.1995), and CG parsers have since been written for a large variety of languages, routinely achieving F-scores for PoS (word class) of over 99%. A number of syntactic CG systems have reported F-scores of around 95%. VISL's own Constraint Grammar systems are inspired by Eckhard Bick's PALAVRAS parser for Portuguese (Bick 2000), and use, as a novelty, subclause function, generalized dependency markers and semantic prototype tags. For most languages, a lexicon based morphological analyzer provides input to the first CG level, while the output of the last CG-level can be converted into syntactic tree structures by specially designed Phrase Structure Grammars (PSG's), using syntactic functions, not words, as terminals. Other, hybrid combinations are, however, feasible. Thus, the French system uses PoS information from a probabilistic tagger.

Constraint Grammar (CG) is a methodological paradigm for Natural Language Parsing (NLP). Linguist-written, context dependent rules are compiled into a grammar that assigns grammatical tags ("readings") to words or other tokens in running text. Typical tags address lemmatisation (lexeme or base form), inflexion, derivation, syntactic function, dependency, valency, case roles, semantic type etc. Each rule either adds, removes, selects or replaces a tag or a set of grammatical tags in a given sentence context. Context conditions can be linked to any tag or tag set of any word anywhere in the sentence, either locally (defined distances) or globally (undefined disances). Context conditions in the same rule may be linked, i.e. conditioned upon each other, negated or blocked by interfering words or tags. Typical CG's consist of thousands of rules, that are applied set-wise in progressive steps, covering ever more advanced levels of analysis. Within each level, safe rules are used before heuristic rules, and no rule is allowed to remove the last reading of a given kind, thus providing for a hight degree of robustness.

The following is an overview over VISL's different CG systems

Language	Parser	Lexicon	Analyzer	Grammar	Levels	Applications
	DanGram	100.000 lexemes, 40.000 names	Full	8.000 rules	morph., syntax, dep., psg, case roles	Teaching, corpus annotation, MT, Spell/Grammar checker, QA-systems, NER
	PALAVRAS	70.000 lexemes, 15.000 names	Full	7.500 rules	morph., syntax, dep., psg	Teaching, corpus annotation, MT, QA-systems, NER
	HIS-PALAVRAS	60.000 lexemes	Full	4.500 rules	morph., syntax, dep., psg	Teaching, corpus annotation
	EngCG	160.000 sem	Full (Lingsoft)	LS+700 rules	morph. / syntax (Lingsoft), subclause, psg	Teaching, corpus annotation
	FrAG	57.000 lexemes	DTT (Schmid & Stein) + analysis	1.400 rules	morph.-correction, syntax, dep., psg	Teaching, corpus annotation
	GerGram	25.000 val/sem	Full (Lingsoft)	LS+1.300 rules	morph. (Lingsoft), syntax, dep., psg	Teaching, corpus annotation
	EspGram	30.000 lexemes	Full	2.600 rules	morph., syntax, dep.	Teaching, corpus annotation, MT