udapi.block.ud.markfeatsbugs module

Block to identify missing or ill-valued features in a treebank. Any bugs that it finds will be saved in the MISC column as a Bug attribute, which can be later used in filters and highlighted in text output. This is a base block that only implements service methods. A language-specific block must be derived from this one and define the actual rules valid in that language.

Usage (Czech example): cat *.conllu | udapy -HAMX layout=compact ud.cs.MarkFeatsBugs > bugs.html

class udapi.block.ud.markfeatsbugs.MarkFeatsBugs(zones='all', if_empty_tree='process', **kwargs)[source]

Bases: Block

bug(node, bugstring)[source]
check_allowed_features(node, allowed)[source]

We need a dictionary indexed by feature names that are allowed; for each feature name, there is a list of allowed values.

check_required_features(node, required)[source]

We need a list of names of features whose values must not be empty.

process_node(node)[source]

This is a generic block, do nothing here. In a language-specific block based on this one, rules similar to the examples below can be specified:

# NOUNS ################################################################ if node.upos == ‘NOUN’:

self.check_required_features(node, [‘Gender’, ‘Number’, ‘Case’, ‘Polarity’]) if node.feats[‘Gender’] == ‘Masc’:

self.check_required_features(node, [‘Animacy’]) self.check_allowed_features(node, {

‘Gender’: [‘Masc’, ‘Fem’, ‘Neut’], ‘Animacy’: [‘Anim’, ‘Inan’], ‘Number’: [‘Sing’, ‘Dual’, ‘Plur’], ‘Case’: [‘Nom’, ‘Gen’, ‘Dat’, ‘Acc’, ‘Voc’, ‘Loc’, ‘Ins’], ‘Polarity’: [‘Pos’, ‘Neg’], ‘Foreign’: [‘Yes’]})

else:
self.check_allowed_features(node, {

‘Gender’: [‘Masc’, ‘Fem’, ‘Neut’], ‘Number’: [‘Sing’, ‘Dual’, ‘Plur’], ‘Case’: [‘Nom’, ‘Gen’, ‘Dat’, ‘Acc’, ‘Voc’, ‘Loc’, ‘Ins’], ‘Polarity’: [‘Pos’, ‘Neg’], ‘Foreign’: [‘Yes’]})

#… # THE REST: NO FEATURES ################################################ else:

self.check_allowed_features(node, {})