udapi.block.ud.markfeatsbugs module¶
Block to identify missing or ill-valued features in a treebank. Any bugs that it finds will be saved in the MISC column as a Bug attribute, which can be later used in filters and highlighted in text output. This is a base block that only implements service methods. A language-specific block must be derived from this one and define the actual rules valid in that language.
Usage (Czech example): cat *.conllu | udapy -HAMX layout=compact ud.cs.MarkFeatsBugs > bugs.html
- class udapi.block.ud.markfeatsbugs.MarkFeatsBugs(zones='all', if_empty_tree='process', **kwargs)[source]¶
Bases:
Block
- check_allowed_features(node, allowed)[source]¶
We need a dictionary indexed by feature names that are allowed; for each feature name, there is a list of allowed values.
- check_required_features(node, required)[source]¶
We need a list of names of features whose values must not be empty.
- process_node(node)[source]¶
This is a generic block, do nothing here. In a language-specific block based on this one, rules similar to the examples below can be specified:
# NOUNS ################################################################ if node.upos == ‘NOUN’:
self.check_required_features(node, [‘Gender’, ‘Number’, ‘Case’, ‘Polarity’]) if node.feats[‘Gender’] == ‘Masc’:
self.check_required_features(node, [‘Animacy’]) self.check_allowed_features(node, {
‘Gender’: [‘Masc’, ‘Fem’, ‘Neut’], ‘Animacy’: [‘Anim’, ‘Inan’], ‘Number’: [‘Sing’, ‘Dual’, ‘Plur’], ‘Case’: [‘Nom’, ‘Gen’, ‘Dat’, ‘Acc’, ‘Voc’, ‘Loc’, ‘Ins’], ‘Polarity’: [‘Pos’, ‘Neg’], ‘Foreign’: [‘Yes’]})
- else:
- self.check_allowed_features(node, {
‘Gender’: [‘Masc’, ‘Fem’, ‘Neut’], ‘Number’: [‘Sing’, ‘Dual’, ‘Plur’], ‘Case’: [‘Nom’, ‘Gen’, ‘Dat’, ‘Acc’, ‘Voc’, ‘Loc’, ‘Ins’], ‘Polarity’: [‘Pos’, ‘Neg’], ‘Foreign’: [‘Yes’]})
#… # THE REST: NO FEATURES ################################################ else:
self.check_allowed_features(node, {})